Methods and systems for determining autism spectrum disorder risk

ABSTRACT

In certain embodiments, the invention stems from the discovery that analysis of population distribution curves of metabolite levels in blood can be used to facilitate predicting risk of autism spectrum disorder (ASD) and/or to differentiate between ASD and non-ASD developmental delay (DD) in a subject. In certain aspects, information from assessment of the presence, absence, and/or direction (upper or lower) of a tail effect in a metabolite distribution curve is utilized to predict risk of ASD and/or to differentiate between ASD and DD.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 14/633,558 filed Feb. 27, 2015, which is a continuation-in-partapplication of U.S. patent application Ser. No. 14/493,141, filed onSep. 22, 2014, which claims the benefit of U.S. Provisional PatentApplication No. 61/978,773 filed Apr. 11, 2014, and U.S. ProvisionalPatent Application No. 62/002,169 filed May 22, 2014; the contents ofeach of which are incorporated by reference herein in their entirety.

FIELD OF THE INVENTION

The present invention relates generally to the prediction of risk forAutism Spectrum Disorder (ASD) and other disorders.

BACKGROUND

Autism Spectrum Disorders (ASD) are pervasive developmental disorderscharacterized by reciprocal social interaction deficits, languagedifficulties, and repetitive behaviors and restrictive interests thatoften manifest during the first 3 years of life. The etiology of ASD ispoorly understood but is thought to be multifactorial, with both geneticand environmental factors contributing to disease development.

Data show that although the average age at which parents begin tosuspect an ASD in their child is 20 months, the median age of diagnosisis not until 54 months. An important challenge from a clinicalperspective is determining, as early as possible, whether a child hasASD and requires specialist referral for an autism treatment plan.

SUMMARY

Diagnosis of ASD is typically made by developmental pediatricians andother specialists only after careful assessment of children usingcriteria spelled out in the Diagnostic and Statistical Manual of MentalDisorders. Reliable diagnosis often entails intense assessment ofsubjects by multiple experts including developmental pediatricians,neurologists, psychiatrists, psychologists, speech and hearingspecialists and occupational therapists. Moreover, the median age ofdiagnosis of ASD is 54 months despite the fact that the average age atwhich parents suspect ASD is as early as 20 months. The CDC (Centers forDisease Control) has observed that only 18% of children who end up withan ASD diagnosis are identified by age 36 months. Regrettably, youngchildren suffering from undiagnosed ASD miss an opportunity to benefitfrom early therapeutic intervention during an important window ofchildhood development. A medical diagnostic test to reliably determineASD risk is needed, particularly to identify younger children earlierwhen therapeutic intervention is likely to be more effective.

Embodiments of the present invention stem from the discovery thatanalysis of distribution curves of measured analytes, such asmetabolites, within and across populations provides information that canbe utilized to build or improve a classifier for prediction of risk fora condition or disorder, such as ASD. In particular, analysis ofpopulation distribution curves of metabolite levels in blood facilitatesprediction of the risk of autism spectrum disorder (ASD) in a subject.For example, analysis of population distribution curves of metabolitelevels in blood can be used to differentiate between autism spectrumdisorder (ASD) and non-ASD developmental disorders in a subject such asdevelopmental delay (DD) not due to autism spectrum disorder.

The statistical analysis of a biomarker differentiating two groupsusually assumes that the two populations differ in their mean biomarkerlevels and that variation around this mean is due to experimental and/orpopulation variation best characterized by a Gaussian distribution.Contrary to this baseline model, it is observed herein that for someanalytes, but not for others, the distribution in ASD, or sometimes inDD, is best characterized as itself composed of multiplesub-distributions—one sub-distribution that is essentiallyundifferentiated from the other health state (e.g., where ASD and DDdistributions are undifferentiated), and another sub-distribution thatis far removed from the mean in a minority of subjects, e.g., a “tail”of the combined distribution for that population. This insight leads toa significantly different analytic framework from the baseline; it isfound that for certain analytes, better results are achieved by defininga threshold based on a top or bottom portion of the populationdistribution, e.g., by establishing a ranking that does not require anunderlying Gaussian distribution model.

Thus, a metabolite is described herein as exhibiting a “tail enrichment”or “tail” effect, where there is an enrichment of samples from aparticular population (e.g., either ASD or DD) at a distal portion ofthe distribution curve of metabolite levels for that metabolite.Information from assessment of the presence, absence, and/or direction(upper or lower) of a tail effect in a metabolite distribution curve canbe utilized to predict risk of ASD. It has been discovered that forparticular metabolites, metabolite levels corresponding to a top orbottom portion (e.g., decile) of the distribution curve, i.e., within a‘tail’ of the distribution curve (whether in a ‘right tail’ or ‘lefttail’), are highly informative of the presence or absence of ASD.

Furthermore, it is found that risk prediction improves as multiplemetabolites are incorporated having a low degree of overlapping, mutualinformation. For example, for assessment of ASD, there are particulargroups of metabolites that provide complementary diagnostic/riskassessment information. That is, ASD-positive individuals who areidentifiable by analysis of the level of a first metabolite (e.g.,individuals within an identified tail of the first metabolite) are notthe same as the ASD-positive individuals who are identifiable byanalysis of a second metabolite (or there may be a low, non-zero degreeof overlap). Without wishing to be bound to a particular theory, thisdiscovery may be reflective of the multi-faceted nature of ASD, itself.

Thus, in certain embodiments, the risk assessment method includesidentifying whether a subject falls within any of a multiplicity ofidentified metabolite tails involving a plurality of metabolites, e.g.,where the predictors of the different metabolite tails are at leastpartially disjoint, e.g., they have low mutual information, such thatrisk prediction improves as multiple metabolites are incorporated withlow mutual information. The classifier has a predetermined level ofpredictability, e.g., in the form of AUC—i.e., area under a ROC curvefor the classifier that plots false positive rate (1-specificity)against true positive rate (sensitivity)—where AUC increases uponaddition of metabolites to the classifier that exhibit tail effects withlow mutual information.

In some embodiments, the invention stems from the discovery that certainthreshold values of metabolite levels in blood can be used to facilitatepredicting risk of autism spectrum disorder (ASD) in a subject. Incertain aspects, these threshold values of metabolites deduced fromassessment of the presence, absence, and/or direction (upper or lower)of a tail effect in a metabolite distribution curve are utilized topredict risk of ASD. In certain aspects, these threshold values could beat either the upper or lower end of the distribution of metabolitelevels in a population. It has been discovered that, for particularmetabolites, levels of the metabolite above an upper threshold valueand/or below a lower threshold value are highly informative of thepresence or absence of ASD.

In some embodiments, levels of these metabolites are useful indistinguishing ASD from other forms of developmental delay (e.g.,developmental delay (DD) not due to autism spectrum disorder).

In one aspect, the invention is directed to a method of differentiatingbetween autism spectrum disorder (ASD) and non-ASD developmental delay(DD) in a subject, the method comprising: (i) measuring the level of afirst metabolite of a plurality of metabolites from a sample obtainedfrom the subject, the population distributions of the first metabolitebeing previously characterized in a first population of subjects withASD and in a second population of subjects with non-ASD developmentaldelay (DD), wherein the first metabolite is predetermined to exhibit anASD tail effect and/or a DD tail effect, each tail effect comprising anassociated right tail or left tail enriched in members of thecorresponding (ASD or DD) population, and where the first metaboliteexhibits an ASD tail effect with a right tail, the level of the firstmetabolite in the sample is within the ASD tail when the level of thefirst metabolite in the sample is greater than a predetermined upper(minimum) threshold defining the right tail enriched in first (ASD)population members, and, where the first metabolite exhibits an ASD taileffect with a left tail, the level of the first metabolite in the sampleis within the ASD tail when the level of the first metabolite in thesample is less than a predetermined lower (maximum) threshold definingthe left tail enriched in first (ASD) population members, and where thefirst metabolite exhibits a DD tail effect with a right tail, the levelof the first metabolite in the sample is within the DD tail when thelevel of the first metabolite in the sample is greater than apredetermined upper (minimum) threshold defining the right tail enrichedin second (DD) population members, and, where the first metaboliteexhibits a DD tail effect with a left tail, the level of the firstmetabolite in the sample is within the DD tail when the level of thefirst metabolite in the sample is less than a predetermined lower(maximum) threshold defining the left tail enriched in second (DD)population members; (ii) measuring the level of at least one additionalmetabolite of the plurality of metabolites from the sample, thepopulation distribution of each of the at least one additionalmetabolite being previously characterized in the first population and inthe second population and predetermined to exhibit at least one of anASD tail effect and a DD tail effect, and, for each of the at least oneadditional metabolite, identifying whether the level of said metabolitein the sample is within the corresponding ASD tail and/or DD tail,according to step (i); and (iii) determining with a predetermined levelof predictability that (a) the subject has ASD and not DD or (b) thesubject has DD and not ASD, based on the identified ASD tails and/or theidentified DD tails within which the sample lies for the metabolitesanalyzed in step (i) and step (ii).

In certain embodiments, the first metabolite is predetermined to exhibitan ASD tail effect with an associated upper (minimum) or lower (maximum)threshold, said threshold predetermined such that the odds that a sampleof unknown classification (a previously uncharacterized sample) meetingthis criteria is ASD as opposed to DD are no less than 1.6:1 with p<0.3.In certain embodiments, the odds are no less than 2:1, or no less than2.5:1, or no less than 2.75:1, or no less than 3:1, or no less than3.25:1, or no less than 3.5:1, or no less than 3.75:1, or no less than4:1. In any of the preceding, p-value (statistical significance value)satisfies p<0.3, or p<0.25, or p<0.2, or p<0.15, or p<0.1, or p<0.05.

In certain embodiments, the first metabolite is predetermined to exhibita DD tail effect with an associated upper (minimum) or lower (maximum)threshold, said threshold predetermined such that the odds that a sampleof unknown classification (a previously uncharacterized sample) meetingthis criteria is DD as opposed to ASD are no less than 1.6:1 with p<0.3.In certain embodiments, the odds are no less than 2:1, or no less than2.5:1, or no less than 2.75:1, or no less than 3:1, or no less than3.25:1, or no less than 3.5:1, or no less than 3.75:1, or no less than4:1. In any of the preceding, p-value (statistical significance value)satisfies p<0.3, or p<0.25, or p<0.2, or p<0.15, or p<0.1, or p<0.05.

In certain embodiments, the predetermined level of predictabilitycorresponds to a Receiver Operating Characteristic (ROC) curve thatplots false positive rate (1-specificity) against true positive rate(sensitivity) having an AUC (area under curve) of at least 0.70.

In certain embodiments, the predetermined upper (minimum) threshold forone or more of the metabolites is a percentile from 85^(th) to 95^(th)percentile (e.g., about the 90^(th) percentile, or about the 85th,86^(th), 87^(th), 88^(th), 89^(th), 91^(st), 92^(nd), 93^(rd), 94^(th),or 95^(th) percentile, rounded to the nearest percentile), and whereinthe predetermined lower (maximum) threshold for one or more of themetabolites is a percentile from 10^(th) to 20^(th) percentile (e.g.,about the 15^(th) percentile, or about the 10^(th), 11^(th), 12^(th),13^(th), 14^(th), 16^(th), 17^(th), 18^(th), 19^(th), or 20^(th)percentile, rounded to the nearest percentile).

In certain embodiments, the plurality of metabolites comprises at leasttwo metabolites selected from the group consisting of5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, and 3-hydroxyhippurate.

In certain embodiments, the plurality of metabolites comprises at leasttwo metabolites selected from the group consisting ofphenylacetylglutamine, xanthine, octenoylcarnitine, p-cresol sulfate,isovalerylglycine, gamma-CEHC, indoleacetate, pipecolate,1,5-anhydroglucitol (1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate,3-indoxyl sulfate, pantothenate (Vitamin B5), andhydroxy-chlorothalonil.

In certain embodiments, the plurality of metabolites comprises at leastthree metabolites selected from the group consisting ofphenylacetylglutamine, xanthine, octenoylcarnitine, p-cresol sulfate,isovalerylglycine, gamma-CEHC, indoleacetate, pipecolate,1,5-anhydroglucitol (1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate,3-indoxyl sulfate, pantothenate (Vitamin B5), andhydroxy-chlorothalonil.

In certain embodiments, the plurality of metabolites comprises at leastone pair of metabolites selected from the pairs listed in Table 6.

In certain embodiments, the plurality of metabolites comprises at leastone triplet of metabolites selected from the triplets listed in Table 7.

In certain embodiments, the plurality of metabolites comprises at leastone pair of metabolites that, combined together as a set of twometabolites, provides an AUC of at least 0.62 (e.g., at least about0.63, 0.64, or 0.65), where AUC is area under a ROC curve that plotsfalse positive rate (1-specificity) against true positive rate(sensitivity) for a classifier based only on the set of two metabolites.

In certain embodiments, the plurality of metabolites comprises at leastone triplet of metabolites that, combined together as a set of threemetabolites, provide an AUC of at least 0.66 (e.g., at least about 0.67or 0.68), where AUC is area under a ROC curve that plots false positiverate (1-specificity) against true positive rate (sensitivity) for aclassifier based only on the set of three metabolites.

In another aspect, the invention is directed to a method of determiningautism spectrum disorder (ASD) risk in a subject, the method comprising:(i) analyzing the level of a first metabolite of a plurality ofmetabolites from a sample obtained from the subject, the populationdistribution of the first metabolite being previously characterized in areference population of subjects having known classifications, whereinthe first metabolite is predetermined to exhibit an ASD tail effectcomprising an associated right tail or left tail enriched in ASDmembers, and where the first metabolite exhibits an ASD tail effect witha right tail, the level of the first metabolite in the sample is withinthe ASD tail when the level of the first metabolite in the sample isgreater than a predetermined upper (minimum) threshold defining theright tail enriched in ASD population members, and, where the firstmetabolite exhibits an ASD tail effect with a left tail, the level ofthe first metabolite in the sample is within the ASD tail when the levelof the first metabolite in the sample is less than a predetermined lower(maximum) threshold defining the left tail enriched in ASD populationmembers; (ii) measuring the level of at least one additional metaboliteof the plurality of metabolites from the sample, the populationdistribution of each of the at least one additional metabolite beingpreviously characterized in the reference population and predeterminedto exhibit an ASD tail effect, and, for each of the at least oneadditional metabolite, identifying whether the level of said metabolitein the sample is within the corresponding ASD tail, according to step(i); and (iii) determining with a predetermined level of predictabilitythe risk of the subject having ASD based on the identified ASD tailswithin which the sample lies for the metabolites analyzed in step (i)and step (ii).

In certain embodiments, the first metabolite is predetermined to exhibitan ASD tail effect with an associated upper (minimum) or lower (maximum)threshold, said threshold predetermined such that the odds that a sampleof unknown classification (a previously uncharacterized sample) meetingthis criteria is ASD as opposed to DD are no less than 1.6:1 with p<0.3.In certain embodiments, the odds are no less than 2:1, or no less than2.5:1, or no less than 2.75:1, or no less than 3:1, or no less than3.25:1, or no less than 3.5:1, or no less than 3.75:1, or no less than4:1. In any of the preceding, p-value (statistical significance value)satisfies p<0.3, or p<0.25, or p<0.2, or p<0.15, or p<0.1, or p<0.05.

In certain embodiments, the predetermined level of predictabilitycorresponds to a Receiver Operating Characteristic (ROC) curve thatplots false positive rate (1-specificity) against true positive rate(sensitivity) having an AUC (area under curve) of at least 0.70.

In certain embodiments, the plurality of metabolites comprises at leasttwo metabolites selected from the group consisting of5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, and 3-hydroxyhippurate.

In another aspect, the invention is directed to a method of determiningautism spectrum disorder (ASD) risk in a subject, comprising: (i)analyzing levels of a plurality of metabolites in a sample obtained fromthe subject, the plurality of metabolites comprising at least twometabolites selected from the group consisting of 5-hydroxyindoleacetate(5-HIAA), 1,5-anhydroglucitol (1,5-AG), 3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, and 3-hydroxyhippurate; and(ii) determining the risk that the subject has ASD based on thequantified levels of the plurality of metabolites.

In certain embodiments, the subject is no greater than about 54 monthsof age. In certain embodiments, the subject is no greater than about 36months of age.

In certain embodiments, the plurality of metabolites comprises at leasttwo metabolites selected from the group consisting ofphenylacetylglutamine, xanthine, octenoylcarnitine, p-cresol sulfate,isovalerylglycine, gamma-CEHC, indoleacetate, pipecolate,1,5-anhydroglucitol (1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate,3-indoxyl sulfate, pantothenate (Vitamin B5), andhydroxy-chlorothalonil.

In certain embodiments, the plurality of metabolites comprises at leastthree metabolites selected from the group consisting ofphenylacetylglutamine, xanthine, octenoylcarnitine, p-cresol sulfate,isovalerylglycine, gamma-CEHC, indoleacetate, pipecolate,1,5-anhydroglucitol (1,5-AG), lactate, 3-(3-hydroxyphenyl)propionate,3-indoxyl sulfate, pantothenate (Vitamin B5), andhydroxy-chlorothalonil.

In certain embodiments, the plurality of metabolites comprises at leastone pair of metabolites selected from the pairs listed in Table 6.

In certain embodiments, the plurality of metabolites comprises at leastone triplet of metabolites selected from the triplets listed in Table 7.

In certain embodiments, the plurality of metabolites comprises at leastone pair of metabolites that, combined together as a set of twometabolites, provides an AUC of at least 0.62 (e.g., at least about0.63, 0.64, or 0.65), where AUC is area under a ROC curve that plotsfalse positive rate (1-specificity) against true positive rate(sensitivity) for a classifier based only on the set of two metabolites.

In certain embodiments, the plurality of metabolites comprises at leastone triplet of metabolites that, combined together as a set of threemetabolites, provide an AUC of at least 0.66 (e.g., at least about 0.67or 0.68), where AUC is area under a ROC curve that plots false positiverate (1-specificity) against true positive rate (sensitivity) for aclassifier based only on the set of three metabolites.

In certain embodiments, the sample is a plasma sample.

In certain embodiments, measuring the levels of metabolites comprisesperforming mass spectrometry. In certain embodiments, performing massspectrometry comprises performing one or more members selected from thegroup consisting of pyrolysis mass spectrometry, Fourier-transforminfrared spectrometry, Raman spectrometry, gas chromatography-massspectroscopy, high pressure liquid chromatography/mass spectroscopy(HPLC/MS), liquid chromatography (LC)-electrospray mass spectroscopy,cap-LC-tandem electrospray mass spectroscopy, and ultrahigh performanceliquid chromatography/electrospray ionization tandem mass spectrometry.

In another aspect, the invention is directed to a method ofdifferentiating between autism spectrum disorder (ASD) and non-ASDdevelopmental delay (DD) in a subject, comprising: (i) analyzing levelsof a plurality of metabolites in a sample obtained from the subject, theplurality of metabolites comprising at least two metabolites selectedfrom the group consisting of 5-hydroxyindoleacetate (5-HIAA),1,5-anhydroglucitol (1,5-AG), 3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, and 3-hydroxyhippurate, thelevels and/or population distributions of the plurality of metabolitesbeing previously characterized in a reference population; and (ii)determining with a predetermined level of predictability that (a) thesubject has ASD and not DD or (b) the subject has DD and not ASD bycomparing the levels of the plurality of metabolites from the samplefrom the subject with predetermined thresholds (e.g., thresholdsdetermined from a reference population of samples having knownclassifications).

In certain embodiments, the invention provides methods for analyzingmetabolites by assigning weights to different metabolites to reflecttheir respective functions in risk prediction. In some embodiments, theweight assignment can be deduced from the biological functions of themetabolites (e.g., the pathways to which they belong), their clinicalutility, or their significance from statistical or epidemiologyanalyses.

In certain embodiments, the invention provides methods for measuringmetabolites using different techniques, including, but not limited to, achromatography assay, a mass spectrometry assay, a fluorimetry assay, anelectrophoresis assay, an immune-affinity assay, and immunochemicalassay.

In certain embodiments, the invention provides methods for determiningautism spectrum disorder (ASD) risk in a subject, comprising analyzinglevels of a plurality of metabolites from a sample from the subject; anddetermining with a predetermined level of predictability whether thesubject has ASD instead of non-ASD developmental disorders based on thequantified levels of the plurality of metabolites.

In certain embodiments, the plurality of metabolites includes at leastone metabolite selected from the group consisting of5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, 3-hydroxyhippurate, andcombinations thereof.

In certain embodiments, the plurality of metabolites include at leasttwo metabolites selected from the group consisting of5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, 3-hydroxyhippurate, andcombinations thereof.

In certain embodiments, the plurality of metabolites includes at least3, at least 4, at least 5, at least 6, at least 7, at least 8, at least9, or at least 10 metabolites selected from the group consisting of5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, 3-hydroxyhippurate, andcombinations thereof.

In certain embodiments, the plurality of metabolites includes additionalmetabolites. In some embodiments, the plurality of metabolites includesmore than 21 metabolites.

In certain embodiments, the invention provides methods fordifferentiating between autism spectrum disorder (ASD) and non-ASDdevelopmental disorders in a subject, comprising steps of analyzinglevels of a plurality of metabolites from a sample from the subject,comparing the levels of the metabolites to their respective populationdistributions in one reference population, and determining with apredetermined level of predictability whether the subject has ASDinstead of non-ASD developmental disorders by comparing the levels ofthe plurality of metabolites from the sample from the subject to thepreviously-characterized levels and/or population distributions of theplurality of metabolites in the reference population.

For example, in certain embodiments, the invention provides a diagnosticcriterion including at least one metabolite that could predict the riskof ASD in a subject with ROC curve having an AUC of at least 0.60, atleast 0.65, at least 0.70, at least 0.75, at least 0.80, at least 0.85or at least 0.90. AUC is area under a ROC curve that plots falsepositive rate (1-specificity) against true positive rate (sensitivity)for the classifier.

In certain embodiments, at least one metabolite for analysis is selectedfrom the group consisting of 5-hydroxyindoleacetate (5-HIAA),1,5-anhydroglucitol (1,5-AG), 3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, 3-hydroxyhippurate, andcombinations thereof.

In certain embodiments, the at least one metabolite for analysiscomprises at least two or more members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21) selected from the groupconsisting of 5-hydroxyindoleacetate (5-HIAA), 1,5-anhydroglucitol(1,5-AG), 3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, 8-hydroxyoctanoate, gamma-CEHC,hydroxyisovaleroylcarnitine (C5), indoleacetate, isovalerylglycine,lactate, N1-Methyl-2-pyridone-5-carboxamide, p-cresol sulfate,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate, xanthine,hydroxy-chlorothalonil, octenoylcarnitine, and 3-hydroxyhippurate, inwhich a non-ASD population distribution curve and an ASD populationdistribution curve is established for each of the metabolites (e.g.,each of said metabolites demonstrating a tail effect).

In certain embodiments, a metabolite for analysis is selected from thegroup consisting of gamma-CEHC, xanthine, p-cresol sulfate,octenoylcarnitine, phenylacetylglutamine, and combinations thereof.

In certain embodiments, a metabolite for analysis is gamma-CEHC.

In certain embodiments, a metabolite for analysis is xanthine.

In certain embodiments, a metabolite for analysis is p-cresol sulfate.

In certain embodiments, a metabolite for analysis is octenoylcarnitine.

In certain embodiments, a metabolite for analysis isphenylacetylglutamine.

In certain embodiments, a metabolite for analysis is isovalerylglycine.

In certain embodiments, a metabolite for analysis is pipecolate.

In certain embodiments, a metabolite for analysis is indoleacetate.

In certain embodiments, a metabolite for analysis is octenoylcarnitine.

In certain embodiments, a metabolite for analysis ishydroxy-chlorothalonil.

In certain embodiments, the plurality of metabolites comprises at leasta first metabolite and a second metabolite that are complementary (e.g.,ASD tail samples for the first and second metabolites are substantiallynon-overlapping such that the predictors provided by the metabolites arepartially disjoint and have low mutual information. In certainembodiments, risk prediction improves as multiple metabolites areincorporated with low mutual information.

In certain embodiments, the plurality of metabolites comprises twometabolites, wherein the two metabolites combined together as a set oftwo metabolites provide an AUC of at least 0.62, 0.63, 0.64, or 0.65.

In certain embodiments, the plurality of metabolites comprises threemetabolites, wherein the three metabolites combined together as a set ofthree metabolites provide an AUC of at least 0.66, 0.67, or 0.68.

In certain embodiments, the invention provides methods ofdifferentiating between autism spectrum disorder (ASD) and a non-ASDdevelopmental disorder in a subject, by analyzing levels of two groupsof previously defined metabolites. In certain embodiments, the firstgroup of metabolites represents metabolites that are closely associatedwith ASD, while the second group of metabolites represents those thatare associated with a control condition (e.g., DD). By analyzing bothgroups of metabolites from a sample from a subject, the risk of thesubject having ASD instead of the control condition can be determined bya variety of methods described in the present disclosure. For example,this can be achieved by comparing the aggregated ASD tail effects forthe first group of metabolites to the aggregated non-ASD tail effectsfor the second group of metabolites.

In certain embodiments, the invention provides methods fordifferentiating between autism spectrum disorder (ASD) and non-ASDdevelopmental delay (DD) in a subject, the method comprising: (i)measuring the levels of a plurality of metabolites in a sample obtainedfrom the subject, wherein the plurality of metabolites comprises atleast two metabolites selected from the group consisting of xanthine,gamma-CEHC, hydroxy-chlorothalonil, 5-hydroxyindoleacetate (5-HIAA),indoleacetate, p-cresol sulfate, 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, hydroxyisovaleroylcarnitine (C5),isovalerylglycine, lactate, N1-Methyl-2-pyridone-5-carboxamide,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,3-hydroxyhippurate, and combinations thereof; and (ii) calculating thenumber of metabolites in the sample with a level at or below apredetermined threshold concentration (a) indicative of ASD (ASD lefttail effect) as defined in Table 9A, or (b) indicative of DD (DD lefttail effect) as defined in Table 9B; and/or (iii) calculating the numberof metabolites in the sample with a level at or above a predeterminedthreshold concentration (a) indicative of ASD (ASD right tail effect) asdefined in Table 9A, or (b) indicative of DD (DD right tail effect) asdefined in Table 9B; and (iv) determining that the subject has ASD or DDbased on the number obtained in steps (ii) and/or (iii).

In certain embodiments, the invention provides methods for determiningthat a subject has or is at risk for ASD, the method comprising: (i)measuring the levels of a plurality of metabolites in a sample obtainedfrom the subject, wherein the plurality of metabolites comprises atleast two metabolites selected from the group consisting of xanthine,gamma-CEHC, hydroxy-chlorothalonil, 5-hydroxyindoleacetate (5-HIAA),indoleacetate, p-cresol sulfate, 1,5-anhydroglucitol (1,5-AG),3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, hydroxyisovaleroylcarnitine (C5),isovalerylglycine, lactate, N1-Methyl-2-pyridone-5-carboxamide,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,3-hydroxyhippurate, and combinations thereof; and (ii) detecting two ormore of: (a) xanthine at a level at or above 182.7 ng/ml; (b)hydroxyl-chlorothalonil at a level at or above 20.3 ng/ml; (c)5-hydroxyindoleacetate at a level at or above 28.5 ng/ml; (d) lactate ata level at or above 686600 ng/ml; (e) pantothenate at a level at orabove 63.3 ng/ml; (f) pipecolate at a level at or above 303.6 ng/ml; (g)gamma-CEHC at a level at or below 32.0 ng/ml; (h) indoleacetate at alevel at or below 141.4 ng/ml; (i) p-cresol sulfate at a level at orbelow 182.7 ng/ml; (j) 1,5-anhydroglucitol (1,5-AG) at a level at orbelow 11910.3 ng/ml; (k) 3-carboxy-4-methyl-5-propyl-2-furanpropanoate(CMPF) at a level at or below 7.98 ng/ml; (l) 3-indoxylsulfate at alevel at or below 256.7 ng/ml; (m) 4-ethylphenyl sulfate at a level ator below 3.0 ng/ml; (n) hydroxyisovaleroylcarnitine (C5) at a level ator below 12.9 ng/ml; (o) N1-Methyl-2-pyridone-5-carboxamide at a levelat or below 124.82 ng/ml; and (p) phenacetylglutamine at a level at orbelow 166.4 ng/ml; and (iii) determining that the subject has or is atrisk for ASD based on the metabolite levels detected in step (ii).

In certain embodiments, the invention provides methods for determiningASD risk in a subject by measuring both levels of certain metabolitesand genetic information from the subject. In some embodiments, thegenetic information includes copy number variation (CNVs), and/orFragile X (FXS) testing.

In additional embodiments, limitations described with respect to certainaspects of the invention can be applied to other aspects of theinvention. For example, the limitations of a claim depending from oneindependent claim may, in some embodiments, be applied to anotherindependent claim.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the distribution of an exemplary metabolite in twopopulations (e.g., ASD and DD), and the mean shift of this metabolitebetween these two populations.

FIG. 2 illustrates the distribution of an exemplary metabolite in twopopulations (e.g., ASD and DD), and a tail effect (e.g., the ASDdistribution has a more densely populated tail) of this metabolitebetween these two populations.

FIG. 3 illustrates the distribution of the metabolite5-hydroxyindoleacetate, in two populations (e.g., ASD and DD), whichexhibits a statistically significant mean shift (t-test; p<0.01) and astatistically significant right tail effect between the two populations.(‘extremes’ signifies tail effect, Fisher's test; p=0.001)

FIG. 4 illustrates the distribution of the metabolite, gamma-CEHC, intwo populations (e.g., ASD and DD), which exhibits a statisticallysignificant left tail effect between the two populations. (‘extremes’signifies tail effect, Fisher's test; p=0.008)

FIG. 5 illustrates the distribution of the metabolite,phenylacetylglutamine, in two populations (e.g., ASD and DD), whichexhibits a statistically significant mean shift (t-test; p=0.001), andstatistically significant left and right tail effects between the twopopulations (‘extremes’ signifies tail effect, Fisher's test; p=0.0001).The distributions appear as shifted Gaussian curves in the twopopulations.

FIG. 6 illustrates the correlation of two exemplary metabolites anddemonstrates that these two metabolites possess distinct profiles oftail effects and are complementary.

FIG. 7 illustrates the tail effects of 12 exemplary metabolites in 180subjects, and their predictive power for ASD and DD.

FIG. 8A illustrates a plot of ASD and non-ASD tail effects for 180samples using an exemplary 12-metabolite panel, demonstrating thatsamples from ASD patients show aggregation of ASD tail effects.

FIG. 8B illustrates a plot of ASD and non-ASD tail effects for 180samples using an exemplary 12-metabolite panel, and an exemplary methodof binning the data.

FIG. 8C illustrates a plot of ASD and non-ASD tail effects for 180samples using an exemplary 21-metabolite panel, demonstrating thatsamples from ASD patients show aggregation of ASD tail effects.

FIG. 9 illustrates increases in the predictability of ASD for anexemplary 12-metabolite panel as the number of metabolites assessedincreases.

FIG. 10A illustrates the effects of trichotomization on thepredictability of ASD using an exemplary 12-metabolite panel.

FIG. 10B illustrates the effects of trichotomization on thepredictability of ASD in the analysis of an exemplary 21-metabolitepanel.

FIG. 11A illustrates an improvement in the predictability of ASD usingvoting methods compared to a non-voting method for analysis of anexemplary 12-metabolite panel.

FIG. 11B illustrates an improvement in the predictability of ASD usingvoting method compared to non-voting method using an exemplary21-metabolite panel.

FIG. 12 illustrates the validation process for using an exemplary12-metabolite panel to achieve a high predictability of ASD.

FIGS. 13A-13U illustrate the population distribution of 21 exemplarymetabolites in an ASD population and a non-ASD population.

FIGS. 14A-B illustrate the effects on the predictability of ASD by theinclusion and exclusion of an exemplary 12-metabolite panel, anexemplary 21-metabolite panel, and a set of 84 candidate metabolitesfrom a total number of 600 metabolites, as assessed by tail effectanalysis and mean shift analysis. (Blacklist=excluded,Whitelist=included, mx_12=exemplary 12-metabolite panel, mx_targeted21=exemplary 21-metabolite panel, mx_all_candidates=84 candidatemetabolites, all features=total set of 600 metabolites)

FIGS. 14C-D illustrate the effects on the predictability of ASD by theby the inclusion (whitelists) and exclusion (blacklists) of an exemplary12-metabolite panel and an exemplary 21-metabolite panel from a totalnumber of 600 metabolites as assessed by tail effect analysis and meanshift analysis, and by comparing logistic regression to Bayes analysis,in two cohorts of samples (i.e., “Christmas” and “Easter”).(Blacklist=excluded, Whitelist=included, mx_12=exemplary 12 metabolitepanel, mx_targeted 21=exemplary 21-metabolite panel,mx_all_candidates=84 candidate metabolites, all features=total set of600 metabolites)

FIG. 15 illustrates the effects on the predictability of ASD by using anincreasing number of metabolites selected from subsets of an exemplary21-metabolite panel.

FIG. 16A illustrates the effects of adding genetic information to thetail effect analysis using an exemplary 12-metabolite panel,demonstrating improved power of separating ASD from non-ASD.

FIG. 16B illustrates the effects of adding genetic information to thetail effect analysis using an exemplary 21-metabolite panel,demonstrating improved power of separating ASD from non-ASD.

FIGS. 17A-B illustrate the effects on the predictability of ASD by theinclusion and exclusion of an exemplary 21-metabolite panel from thetotal number metabolites, by comparing tail effect analysis to meanshift analysis, and by comparing logistic regression to Bayes analysis.(Blacklist=excluded, Whitelist=included, mx_12=exemplary 12-metabolitepanel, mx_targeted 21=exemplary 21-metabolite panel,mx_all_candidates=84 candidate metabolites, all features=total set of600 metabolites)

FIGS. 18A-B illustrate the effects on the predictability of ASD by theinclusion and exclusion of an exemplary 21-metabolite panel from thetotal number metabolites, by comparing tail effect analysis to meanshift analysis, and by using logistic regression in two cohorts (i.e.,“Christmas” and “Easter”). (Blacklist=excluded, Whitelist=included,mx_(—) 12=exemplary 12-metabolite panel, mx_targeted 21=exemplary21-metabolite panel, mx_all_candidates=84 candidate metabolites, allfeatures=total set of 600 metabolites)

FIGS. 19A-D illustrate the effects on the predictability of ASD by theinclusion and exclusion of an exemplary 21-metabolite panel from thetotal number metabolites, by comparing tail effect analysis to meanshift analysis, and by comparing logistic regression to Bayes analysisusing either the “Christmas” cohort, or the “Easter” cohort, or bothcombined. (Blacklist=excluded, Whitelist=included, mx_12=exemplary12-metabolite panel, mx_targeted 21=exemplary 21-metabolite panel,mx_all_candidates=84 candidate metabolites, all features=total set of600 metabolites)

FIG. 20 illustrates a representative plot of the specificity andsensitivity of tail effect analysis for an exemplary 21-metabolite panelfor prediction of ASD.

FIG. 21 illustrates a scoring system in which a risk score for ASD or DDis calculated based on the sum of the log 2 values of odds ratios forASD and DD predictive metabolites.

DEFINITIONS

In order for the present invention to be more readily understood,certain terms are first defined below. Additional definitions for thefollowing terms and other terms are set forth throughout thespecification.

In this application, unless otherwise clear from context, (i) the term“a” may be understood to mean “at least one”; (ii) the term “or” may beunderstood to mean “and/or”; (iii) the terms “comprising” and“including” may be understood to encompass itemized components or stepswhether presented by themselves or together with one or more additionalcomponents or steps; and (iv) the terms “about” and “approximately” maybe understood to permit standard variation as would be understood bythose of ordinary skill in the art; and (v) where ranges are provided,endpoints are included.

Agent: The term “agent” as used herein may refer to a compound or entityof any chemical class including, for example, polypeptides, nucleicacids, saccharides, lipids, small molecules, metals, or combinationsthereof.

Approximately: As used herein, the term “approximately” and “about” isintended to encompass normal statistical variation as would beunderstood by those of ordinary skill in the art as appropriate to therelevant context. In certain embodiments, the term “approximately” or“about” refers to a range of values that fall within 25%, 20%, 19%, 18%,17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%,1%, or less in either direction (greater than or less than) of thestated reference value unless otherwise stated or otherwise evident fromthe context (except where such number would exceed 100% of a possiblevalue).

Area under curve (AUC): A classifier has an associated ROC curve(Receiver Operating Characteristic curve) that plots false positive rate(1-specificity) against true positive rate (sensitivity). The area underthe ROC curve (AUC) is a measure of how well the classifier candistinguish between two diagnostic groups. A perfect classifier has anAUC of 1.0, as compared with a random classifier, which has an AUC of0.5.

Associated with: Two events or entities are “associated” with oneanother, as that term is used herein, if the presence, level and/or formof one is correlated with that of the other. For example, a particularentity is considered to be associated with a particular disease,disorder, or condition, if its presence, level and/or form correlateswith incidence of and/or susceptibility of the disease, disorder, orcondition (e.g., across a relevant population).

Autism spectrum disorder: As used herein, the term “autistic spectrumdisorder” is recognized by those of skill in the art to refer to adevelopmental disorder on the autism “spectrum” characterized by one ormore of reciprocal social interaction deficits, language difficulties,repetitive behaviors and restrictive interests. Autism spectrum disorderhas been characterized in the DSM-V (May 2013) as a disorder comprisinga continuum of symptoms including, for example, communication deficits,such as responding inappropriately in conversations, misreadingnonverbal interactions, difficulty building friendships appropriate toage, overdependence on routines, highly sensitive to changes in theirenvironment, and/or intensely focused on inappropriate items. Autismspectrum disorder has additionally been characterized, for example, byDSM-IV-TR, to be inclusive of Autistic Disorder, Asperger's Disorder,Rett's Disorder, Childhood Disintegrative Disorder, and PervasiveDevelopmental Disorder Not Otherwise Specified (including AtypicalAutism). In some embodiments, autism spectrum disorder (ASD) ischaracterized using standardized testing instruments such asquestionnaires and observation schedules. For example, in someembodiments, ASD is characterized by (i) a score meeting the cutoff forautism on Communication plus Social Interaction Total in the AustismDiagnostic Observation Schedule (ADOS) and a score meeting the cutoffvalue on Social Interaction, Communication, Patterns of Behavior, andAbnormality of Development at <36 months in Autism DiagnosticInterview-Revised (ADI-R); and/or (ii) a score meeting the ASD cutoff onCommunication and Social Interaction Total in ADOS and a score meetingthe cutoff value on Social Interaction, Communication, Patterns ofBehavior, and Abnormality of Development at <36 months in ADI-R and(ii)(a) a score meeting the cutoff value for Social Interaction andCommunication in ADI-R or (ii)(b) a score meeting the cutoff value forSocial Interaction or Communication and within 2 points of the cutoffvalue on Social Interaction or Communication (whichever did not meet thecutoff value) in ADI-R or (ii)(c) a score is within 1 point of cutoffvalue for Social Interaction and Communication in ADI-R.

Classification: As used herein, “classification” is the process oflearning to separate data points into different classes by findingcommon features between collected data points which are within knownclasses and then using mathematical methods or other methods to assigndata points to one of the different classes. In statistics,classification is the problem of identifying the sub-population to whichnew observations belong, where the identity of the sub-population isunknown, on the basis of a training set of data containing observationswhose sub-population is known. Thus the requirement is that newindividual items are placed into groups based on quantitativeinformation on one or more measurements, traits or characteristics,etc., and based on the training set in which previously decidedgroupings are already established. Classification has many applications.In some cases, it is employed as a data mining procedure, while inothers more detailed statistical modeling is undertaken.

Classifier: As used herein, a “classifier” is a method, algorithm,computer program, or system for performing data classification. Examplesof widely used classifiers include, but are not limited to, the neuralnetwork (multi-layer perceptron), logistic regression, support vectormachines, k-nearest neighbors, Gaussian mixture model, Gaussian naiveBayes, decision tree, partial-least-squares determinant analysis(PSL-DA), Fisher's linear discriminant, Logistic regression, Naïve Bayesclassifier, Perceptron, support vector machines, quadratic classifiers,Kernet estimation, Boosting, Neural networks, Bayesian networks, HiddenMarkov models, and Learning vector quantization.

Determine: Many methodologies described herein include a step of“determining”. Those of ordinary skill in the art, reading the presentspecification, will appreciate that such “determining” can utilize or beaccomplished through use of any of a variety of techniques available tothose skilled in the art, including for example specific techniquesexplicitly referred to herein. In some embodiments, determining involvesmanipulation of a physical sample. In some embodiments, determininginvolves consideration and/or manipulation of data or information, forexample utilizing a computer or other processing unit adapted to performa relevant analysis. In some embodiments, determining involves receivingrelevant information and/or materials from a source. In someembodiments, determining involves comparing one or more features of asample or entity to a comparable reference.

Determining risk: As used herein, determining risk includes calculatingor quantifying a probability that a given subject has, or does not have,a particular condition or disorder. In some embodiments, a positive ornegative diagnosis for a disorder or condition, for example, autismspectrum disorder (ASD) or developmental delay (DD) may be made based inwhole or in part on a determined risk or risk score (e.g., an oddsratio, or range).

Developmental delay: As used herein, the phrase developmental delay (DD)refers to ongoing major or minor delay in one or more processes of childdevelopment, including, for example, physical development, cognitivedevelopment, communication development, social or emotional development,or adaptive development that is not due to autism spectrum disorder.Even though an individual with ASD may be considered to bedevelopmentally delayed, the classification of ASD as used herein willbe considered to trump that of DD such that the classifications of ASDand DD are mutually exclusive. In other words, unless indicatedotherwise, the classification of DD is assumed to mean non-ASDdevelopmental delay. In some embodiments, DD is characterized bynon-autism (AU) and non-ASD, yet with (i) score of 69 or lower on aMullen Scale, score of 69 or lower on Vineland Scale, and score of 14 orlower on SCQ, or (ii) score of 69 or lower on either Mullen or Vinelandand within half a standard deviation of cutoff value on the otherassessment (score 77 or lower).

Diagnostic information: As used herein, diagnostic information orinformation for use in diagnosis is any information that is useful indetermining whether a patient has a disease or condition and/or inclassifying the disease or condition into a phenotypic category or anycategory having significance with regard to prognosis of the disease orcondition, or likely response to treatment (either treatment in generalor any particular treatment) of the disease or condition. Similarly,diagnosis refers to providing any type of diagnostic information,including, but not limited to, whether a subject is likely to have adisease or condition (such as autism spectrum disorder), state, stagingor characteristic of the disease or condition as manifested in thesubject, information related to the nature or classification of thedisorder, information related to prognosis and/or information useful inselecting an appropriate treatment. Selection of treatment may includethe choice of a particular therapeutic agent or other treatment modalitysuch as behavioral therapy, diet modification, etc., a choice aboutwhether to withhold or deliver therapy, a choice relating to dosingregimen (e.g., frequency or level of one or more doses of a particulartherapeutic agent or combination of therapeutic agents), etc.

Marker: A marker, as used herein, refers to an agent whose presence orlevel is associated with, or has a correlation to, a particular diseaseor condition. Alternatively or additionally, in some embodiments, apresence or level of a particular marker correlates with activity (oractivity level) of a particular signaling pathway, for example that maybe characteristic of a particular disorder. The marker may or may notplay an etiological role in the disease or condition. The statisticalsignificance of the presence or absence of a marker may vary dependingupon the particular marker. In some embodiments, detection of a markeris highly specific in that it reflects a high probability that thedisorder is of a particular subclass. According to the present inventiona useful marker need not distinguish disorders of a particular subclasswith 100% accuracy.

Metabolite: As used herein, the term metabolite refers to a substanceproduced during a bodily chemical or physical process. The term“metabolite” includes any chemical or biochemical product of a metabolicprocess, such as any compound produced by the processing, cleavage orconsumption of a biological molecule. Examples of such moleculesinclude, but are not limited to: acids and related compounds; mono-,di-, and tri-carboxylic acids (saturated, unsaturated aliphatic andcyclic, aryl, alkaryl); aldo-acids, keto-acids; lactone forms;gibbereillins; abscisic acid; alcohols, polyols, derivatives, andrelated compounds; ethyl alcohol, benzyl alcohol, menthanol; propyleneglycol, glycerol, phytol; inositol, furfuryl alcohol, menthol;aldehydes, ketones, quinones, derivatives, and related compounds;acetaldehyde, butyraldehyde, benzaldehyde, acrolein, furfural, glyoxal;acetone, butanone; anthraquinone; carbohydrates; mono-, di-,tri-saccharides; alkaloids, amines, and other bases; pyridines(including nicotinic acid, nicotinamide); pyrimidines (includingcytidine, thymine); purines (including guanine, adenine,xanthines/hypoxanthines, kinetin); pyrroles; quinolines (includingisoquinolines); morphinans, tropanes, cinchonans; nucieotides,oligonucleotides, derivatives, and related compounds; guanosine,cytosine, adenosine, thymidine, inosine; amino acids, oligopepides,derivatives, and related compounds; esters; phenols and relatedcompounds; heterocyclic compounds and derivatives; pyrroles,tetrapyrroles (corrinoids and porphines/porphyrins, w/w/o metal-ion);flavonoids; indoles; lipids (including fatty acids and triglycerides),derivatives, and related compounds; carotenoids, phytoene; and sterols,isoprenoids including terpenes; and modified version of the abovemolecules. In some embodiments, a metabolite is the product ofmetabolism of an endogenous substance. In some embodiments, a metaboliteis the product of metabolism of an exogenous substance. In someembodiments, a metabolite is the product of metabolism of an endogenoussubstance and an exogenous substance. As used herein, the term“metabolome” refers to the chemical profile or fingerprint of themetabolites in a bodily fluid, a cell, a tissue, an organ, or anorganism.

Metabolite distribution curve: As used herein, a metabolite distributioncurve is a probability distribution curve defined by a function derivedfrom metabolite level plotted against population density (e.g., ASD orDD). In some embodiments, the distribution curve is a standard curve fitof the data. In some embodiments, the distribution curve is a leastsquares polynomial curve fit. In some embodiments, the distributioncurve is asymmetric, or non-Gaussian. In some embodiments, thedistribution curve is simply a plot of cases with associated diagnosticcategory vs. metabolite values (e.g., a ‘rug plot’), where there is nocurve fit.

Mutual information: As used herein, mutual information refers to ameasure of the mutual dependence of two variables (i.e., a degree towhich knowing one variable reduces uncertainty about another variable.)High mutual information indicates a large reduction in uncertainty; lowmutual information indicates a small reduction; and zero mutualinformation between two random variables means the variables areindependent.

Non-autism spectrum disorder (non-ASD): As used herein, non-autismspectrum disorder (non-ASD) refers to a classification that is not of achild or adult with an autistic spectrum disorder. In some embodiments,“non-ASD” is normally developing subjects. In some embodiments, anon-ASD population consists of or comprises subjects with developmentaldelay (DD). In some embodiments, “non-ASD” consists of or comprises bothDD and normally developing subjects.

Patient: As used herein, the term “patient” or “subject” refers to anyorganism to which a test or composition is or may be administered, e.g.,for experimental, diagnostic, prophylactic, and/or therapeutic purposes.In some embodiments, a patient is suffering from or susceptible to oneor more disorders or conditions. In some embodiments, a patient displaysone or more symptoms of a disorder or condition. In some embodiments, apatient is suspected to have one or more disorders or conditions.

Predictability: As used herein, predictability refers to the degree towhich a correct prediction or forecast of a subject's disease status canbe made either qualitatively or quantitatively. Perfect predictabilityimplies strict determinism, but lack of predictability does notnecessarily imply lack of determinism. Limitations on predictabilitycould be caused by factors such as a lack of information or excessivecomplexity.

Prognostic and predictive information: As used herein, the termsprognostic and predictive information are used interchangeably to referto any information that may be used to indicate any aspect of the courseof a disease or condition either in the absence or presence oftreatment. Such information may include, but is not limited to, thelikelihood that a patient will be cured of a disease, the likelihoodthat a patient's disease will respond to a particular therapy (whereinresponse may be defined in any of a variety of ways). Prognostic andpredictive information are included within the broad category ofdiagnostic information.

Reference: The term “reference” is often used herein to describe astandard or control agent, individual, population, sample, sequence orvalue against which an agent, individual, population, sample, sequenceor value of interest is compared. In some embodiments, a referenceagent, individual, population, sample, sequence or value is testedand/or determined substantially simultaneously with the testing ordetermination of the agent, individual, population, sample, sequence orvalue of interest. In some embodiments, a reference agent, individual,population, sample, sequence or value is a historical reference,optionally embodied in a tangible medium. Typically, as would beunderstood by those skilled in the art, a reference agent, individual,population, sample, sequence or value is determined or characterizedunder conditions comparable to those utilized to determine orcharacterize the agent, individual, population, sample, sequence orvalue of interest.

Regression analysis: As used herein, “regression analysis” includes anytechniques for modeling and analyzing several variables, when the focusis on the relationship between a dependent variable and one or moreindependent variables. More specifically, regression analysis helpsunderstand how the typical value of the dependent variable changes whenany one of the independent variables is varied, while the otherindependent variables are held fixed. Most commonly, regression analysisestimates the conditional expectation of the dependent variable giventhe independent variables—that is, the average value of the dependentvariable when the independent variables are held fixed. Less commonly,the focus is on a quantile, or other location parameter of theconditional distribution of the dependent variable given the independentvariables. In all cases, the estimation target is a function of theindependent variables called the regression function. In regressionanalysis, it is also of interest to characterize the variation of thedependent variable around the regression function, which can bedescribed by a probability distribution. Regression analysis is widelyused for prediction and forecasting, where its use has substantialoverlap with the field of machine learning. Regression analysis is alsoused to understand which among the independent variables are related tothe dependent variable, and to explore the forms of these relationships.In restricted circumstances, regression analysis can be used to infercausal relationships between the independent and dependent variables. Alarge body of techniques for carrying out regression analysis has beendeveloped. Familiar methods such as linear regression and ordinary leastsquares regression are parametric, in that the regression function isdefined in terms of a finite number of unknown parameters that areestimated from the data. Nonparametric regression refers to techniquesthat allow the regression function to lie in a specified set offunctions, which may be infinite-dimensional.

Risk: As will be understood from context, a “risk” of a disease,disorder or condition is a degree of likelihood that a particularindividual will be diagnosed with or will develop the disease, disorder,or condition. In some embodiments, risk is expressed as a percentage. Insome embodiments, risk is from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 up to100%. In some embodiments risk is expressed as a risk relative to a riskassociated with a reference sample or group of reference samples. Insome embodiments, a reference sample or group of reference samples havea known risk of a disease, disorder, or condition. In some embodiments,a reference sample or group of reference samples are from individualscomparable to a particular individual. In some embodiments, relativerisk is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In some embodiment,relative risk can be expressed as Relative Risk (RR) or Odds Ratio (OR).

Sample: As used herein, the term “sample” typically refers to abiological sample obtained or derived from a source of interest, asdescribed herein. In some embodiments, a source of interest comprises anorganism, such as an animal or human. In some embodiments, a biologicalsample is or comprises biological tissue or fluid. In some embodiments,a biological sample may be or comprise bone marrow; blood; plasma;serum; blood cells; ascites; tissue or fine needle biopsy samples;cell-containing body fluids; free floating nucleic acids; sputum;saliva; urine; cerebrospinal fluid, peritoneal fluid; pleural fluid;feces; lymph; gynecological fluids; skin swabs; vaginal swabs; oralswabs; nasal swabs; washings or lavages such as a ductal lavages orbroncheoalveolar lavages; aspirates; scrapings; bone marrow specimens;tissue biopsy specimens; surgical specimens; feces, other body fluids,secretions, and/or excretions; and/or cells therefrom, etc. In someembodiments, a biological sample is or comprises cells obtained from anindividual. In some embodiments, obtained cells are or include cellsfrom an individual from whom the sample is obtained. In someembodiments, a sample is a “primary sample” obtained directly from asource of interest by any appropriate means. For example, in someembodiments, a primary biological sample is obtained by methods selectedfrom the group consisting of biopsy (e.g., fine needle aspiration ortissue biopsy), surgery, collection of body fluid (e.g., blood, lymph,feces etc.), etc. In some embodiments, as will be clear from context,the term “sample” refers to a preparation that is obtained by processing(e.g., by removing one or more components of and/or by adding one ormore agents to) a primary sample. For example, filtering using asemi-permeable membrane. Such a “processed sample” may comprise, forexample nucleic acids or proteins extracted from a sample or obtained bysubjecting a primary sample to techniques such as amplification orreverse transcription of mRNA, isolation and/or purification of certaincomponents, etc.

Subject: By “subject” is meant a mammal (e.g., a human, in someembodiments including prenatal human forms). In some embodiments, asubject is suffering from a relevant disease, disorder or condition. Insome embodiments, a subject is susceptible to a disease, disorder, orcondition. In some embodiments, a subject displays one or more symptomsor characteristics of a disease, disorder or condition. In someembodiments, a subject does not display any symptom or characteristic ofa disease, disorder, or condition. In some embodiments, a subject issomeone with one or more features characteristic of susceptibility to orrisk of a disease, disorder, or condition. A subject can be a patient,which refers to a human presenting to a medical provider for diagnosisor treatment of a disease. In some embodiments, a subject is anindividual to whom therapy is administered.

Substantially: As used herein, the term “substantially” refers to thequalitative condition of exhibiting total or near-total extent or degreeof a characteristic or property of interest. One of ordinary skill inthe biological arts will understand that biological and chemicalphenomena rarely, if ever, go to completion and/or proceed tocompleteness or achieve or avoid an absolute result. The term“substantially” is therefore used herein to capture the potential lackof completeness inherent in many biological and chemical phenomena.

Suffering from: An individual who is “suffering from” a disease,disorder, or condition has been diagnosed with and/or exhibits or hasexhibited one or more symptoms or characteristics of the disease,disorder, or condition.

Susceptible to: An individual who is “susceptible to” a disease,disorder, or condition is at risk for developing the disease, disorder,or condition. In some embodiments, such an individual is known to haveone or more susceptibility factors that are statistically correlatedwith increased risk of development of the relevant disease, disorder,and/or condition. In some embodiments, an individual who is susceptibleto a disease, disorder, or condition does not display any symptoms ofthe disease, disorder, or condition. In some embodiments, an individualwho is susceptible to a disease, disorder, or condition has not been ornot yet been diagnosed with the disease, disorder, and/or condition. Insome embodiments, an individual who is susceptible to a disease,disorder, or condition is an individual who has been exposed toconditions associated with development of the disease, disorder, orcondition. In some embodiments, a risk of developing a disease,disorder, and/or condition is a population-based risk (e.g., familymembers of individuals suffering from allergy, etc.)

Tail enrichment and tail effect: As used herein, the terms “tailenrichment” or “tail effect” refer to a classification-enhancingproperty exhibited by a metabolite (or other analyte) that has arelatively high concentration of samples from a particular population ata distal portion of a distribution curve of metabolite levels. An “uppertail” or “right tail” refers to a distal portion of a distribution curvethat is greater than the mean. A “lower tail” or “left tail” refers to adistal portion of a distribution curve that is lower than the mean. Insome embodiments, a tail is determined by a predetermined thresholdvalue based on ranking. For example, a sample is designated to be withina tail if its measurement for a certain metabolite is higher than thevalue corresponding to a percentile from 85^(th) to 95^(th) (e.g.,90^(th)) in a population for that metabolite, or is lower than the valuecorresponding to a percentile from 10^(th) to 20^(th) (e.g., 15^(th)) inthe population for that metabolite.

Therapeutic agent: As used herein, the phrase “therapeutic agent” refersto any agent that has a therapeutic effect and/or elicits a desiredbiological and/or pharmacological effect, when administered to asubject. In some embodiments, an agent is considered to be a therapeuticagent if its administration to a relevant population is statisticallycorrelated with a desired or beneficial therapeutic outcome in thepopulation, whether or not a particular subject to whom the agent isadministered experiences the desired or beneficial therapeutic outcome.

Training set: As used herein, a “training set” is a set of data used invarious areas of information science to discover potentially predictiverelationships. Training sets are used in artificial intelligence,machine learning, genetic programming, intelligent systems, andstatistics. In all these fields, a training set has much the same roleand is often used in conjunction with a test set.

Test set: As used herein, a “test set” is a set of data used in variousareas of information science to assess the strength and utility of apredictive relationship. Test sets are used in artificial intelligence,machine learning, genetic programming, intelligent systems, andstatistics. In all these fields, a test set has much the same role.

Treatment: As used herein, the term “treatment” (also “treat” or“treating”) refers to any administration of a substance or therapy(e.g., behavioral therapy) that partially or completely alleviates,ameliorates, relieves, inhibits, delays onset of, reduces severity of,and/or reduces frequency, incidence or severity of one or more symptoms,features, and/or causes of a particular disease, disorder, and/orcondition. Such treatment may be of a subject who does not exhibit signsof the relevant disease, disorder and/or condition and/or of a subjectwho exhibits only early signs of the disease, disorder, and/orcondition. Alternatively or additionally, such treatment may be of asubject who exhibits one or more established signs of the relevantdisease, disorder and/or condition. In some embodiments, treatment maybe of a subject who has been diagnosed as suffering from the relevantdisease, disorder, and/or condition. In some embodiments, treatment maybe of a subject known to have one or more susceptibility factors thatare statistically correlated with increased risk of development of therelevant disease, disorder, and/or condition.

DETAILED DESCRIPTION

The present invention provides methods and systems for determining riskof autism spectrum disorder (ASD) in a subject based on specificanalysis of metabolite levels in a sample, e.g., a blood sample or aplasma sample. Various aspects of the invention are described in detailin the following sections. The use of sections and headers is not meantto limit the invention. Each section can apply to any aspect of theinvention. In this application, the use of “or” means “and/or” unlessotherwise apparent.

Autism Spectrum Disorder

Criteria for a clinical diagnosis of autism spectrum disorder (ASD) hasbeen set forth in the Diagnostics and Statistical Manual of MentalDisorders, version 5 (DSM-V, published in May 2013).

ASD has additionally been characterized, for example, by DSM-IV-TR, tobe inclusive of Autistic Disorder, Asperger's Disorder, Rett's Disorder,Childhood Disintegrative Disorder, and Pervasive Developmental DisorderNot Otherwise Specified (including Atypical Autism).

In some embodiments, ASD is characterized by (i) a score meeting thecutoff for autism on Communication plus Social Interaction Total in ADOSand a score meeting the cutoff value on Social Interaction,Communication, Patterns of Behavior, and Abnormality of Development at<36 months in ADI-R; and/or (ii) a score meeting the ASD cutoff onCommunication and Social Interaction Total in ADOS and a score meetingthe cutoff value on Social Interaction, Communication, Patterns ofBehavior, and Abnormality of Development at <36 months in ADI-R and(ii)(a) a score meeting the cutoff value for Social Interaction andCommunication in ADI-R or (ii)(b) a score meeting the cutoff value forSocial Interaction or Communication and within 2 points of the cutoffvalue on Social Interaction or Communication (whichever did not meet thecutoff value) in ADI-R or (ii)(c) a score is within 1 point of cutoffvalue for Social Interaction and Communication in ADI-R.

Developmental Delay

Development delay is a major or minor delay in one or more processes ofchild development, including, for example, physical development,cognitive development, communication development, social or emotionaldevelopment, or adaptive development that is not due to ASD. In someembodiments, DD is characterized by non-Autism (AU) and non-ASD with (i)score of 69 or lower on a Mullen Scale, score of 69 or lower on VinelandScale, and score of 14 or lower on SCQ, or (ii) score of 69 or lower oneither Mullen or Vineland and within half a standard deviation of cutoffvalue on the other assessment (score 77 or lower). Even though anindividual with ASD may be considered to be developmentally delayed, theclassification of ASD as used herein will be considered to trump that ofDD such that the classifications of ASD and DD are mutually exclusive.

Risk Assessment of ASD

Children who present with symptoms of impaired language, behavioral, orsocial development are often seen by clinicians, most commonly in aprimary care setting, who are unable to determine whether that child hasASD, or some other condition, disorder, or classification (e.g., DD). Itis difficult to diagnose children, particularly at an age prior toextensive language development, and many primary care physicians do nothave the ability or resources to make a differential diagnosis of theirpatients. For example, ASD may not be easily distinguished from otherdevelopmental disorders, conditions, or classifications, such as DD.

It is useful to assess risk of ASD in a subject (including probabilityof non-ASD and DD), and to differentiate ASD from DD. Risk assessment ofASD provides opportunities for early intervention and treatment. Forexample, a non-specialist physician may use ASD risk assessment toinitiate a referral to a specialist. A specialist may use ASD riskassessment to prioritize further evaluation of patients. Assessment ofASD risk may also be used to establish a provisional diagnosis, prior toa final diagnosis, during which time facilitative services can beprovided to a high risk child and his or her family.

Described herein are methods for determining risk of ASD in a subject.In some embodiments, determining ASD risk includes determining that thesubject has a greater than about a 50% chance of having ASD. In someembodiments, determining ASD risk includes determining the subject has agreater than about 60%, 65%, 70%, 74%, 80%, 85%, 90%, 95%, or 98% chanceof having ASD. In some embodiments determining ASD risk includesdetermining that a subject has ASD. In some embodiments, determining ASDrisk includes determining that a subject does not have ASD (i.e.,non-ASD).

In some embodiments, the invention provides methods for differentiatingASD from a non-ASD classification (e.g., DD) in a subject. In someembodiments, differentiating ASD from the non-ASDclassification/condition includes determining the subject has a greaterthan about 60%, 65%, 70%, 74%, 80%, 85%, 90%, 95%, or 98% chance ofhaving ASD instead of the non-ASD classification (i.e., chance of havingASD and not having the non-ASD classification). In some embodiments, thenon-ASD classification is DD. In some embodiments, the non-ASDclassification is “normal”.

In some embodiments, the invention provides methods for determining thata subject does not have either ASD or DD.

Analytical Methods

Described herein are methods for assessing ASD risk, or differentiatingASD from other non-ASD developmental disorders. In some embodiments, therisk assessment is based (at least in part) on measurement andcharacterization of metabolites in a sample from a subject, e.g., ablood sample. In some embodiments, a plasma sample is derived from theblood sample, and the plasma sample is analyzed.

Metabolites can be detected in a variety of ways, including assays basedon chromatography and/or mass spectrometry, fluorimetry,electrophoresis, immune-affinity, hybridization, immunochemistry,ultra-violet spectroscopy (UV), fluorescence analysis, radiochemicalanalysis, near-infrared spectroscopy (nearIR), nuclear magneticresonance spectroscopy (NMR), light scattering analysis (LS), andnephelometry.

In some embodiments, the metabolites are analyzed by liquid or gaschromatography or ion mobility (electrophoresis) alone or coupled withmass spectrometry or by mass spectrometry alone. Such methods have beenused to identify and quantify biomolecules, such as cellularmetabolites. (See, for example, Li et al., 2000; Rowley et al., 2000;and Kuster and Mann, 1998). Mass spectrometry methods may be based on,for example, quadrupole, ion-trap, or time-of-flight mass spectrometry,with single, double, or triple mass-to-charge scanning and/or filtering(MS, MS/MS, or MS³) and preceded by appropriate ionization methods suchas electrospray ionization, atmospheric pressure chemical ionization,atmospheric pressure photo ionization, matrix-assisted laser desorptionionization (MALDI), or surface-enhanced laser desorption ionization(SELDI). (See, for example, International Patent Application PublicationNos. WO 2004056456 and WO 2004088309). In some embodiments, the firstseparation of metabolites from a biological sample can be achieved byusing gas or liquid chromatography or ion mobility/electrophoresis. Insome embodiments, the ionization for mass spectrometry procedures can beachieved by electrospray ionization, atmospheric pressure chemicalionization, or atmospheric pressure photoionization. In someembodiments, mass spectrometry instruments include quadrupole, ion-trap,or time-of-flight, or Fourier transform instruments.

In some embodiments, metabolites are analyzed on a mass scale via anon-targeted ultrahigh performance liquid or gaschromatography/electrospray or atmospheric pressure chemical ionizationtandem mass spectrometry platform optimized for the identification andrelative quantification of the small-molecule complement of biologicalsystems. (See, for example, Evans et al., Anal. Chem., 2009, 81,6656-6667).

In some embodiments, the first separation of metabolites from abiological sample can be achieved by using gas or liquid chromatographyor ion mobility/electrophoresis. In some embodiments, the ionization formass spectrometry procedures can be achieved by electrospray ionization,atmospheric pressure chemical ionization, or atmospheric pressurephotoionization. In some embodiments, mass spectrometry instrumentsinclude quadrupole, ion-trap, or time-of-flight, or Fourier transforminstruments.

In some embodiments, a blood sample containing metabolites of interestis centrifuged to separate plasma from other blood components. Incertain embodiments, internal standards are unnecessary. In someembodiments, defined amounts of internal standards are added to (aportion of) the plasma, and then methanol is added to precipitate plasmacomponents such as proteins. Precipitates are separated from supernatantby centrifugation, and the supernatant is harvested. If theconcentration of a metabolite of interest is to be increased for moreaccurate detection, the supernatant is evaporated and the residualdissolved in the appropriate amount of solvent. If the concentration ofa metabolite of interest is undesirably high, the supernatant is dilutedin the appropriate solvent. An appropriate amount ofmetabolite-containing sample is loaded onto a liquid-chromatographycolumn equilibrated with the appropriate mixture of mobile phase A andmobile phase B. In the case of reversed-phase liquid chromatography,mobile phase A typically is water with or without a small amount of anadditive such as formic acid, and mobile phase B typically is methanolor acetonitrile. An appropriate gradient of mobile phase A and mobilephase B is pumped through the column to achieve separation ofmetabolites of interest by retention time—or time of elution from thecolumn. As metabolites elute from the column, they are ionized andbrought into the gas phase, and the ions are detected and quantified bymass spectrometry. Specificity of detection is achieved bydouble-filtering for a specific precursor ion and a specific product iongenerated from the precursor ion. Absolute quantification may beachieved by normalizing ion counts derived from the metabolite ofinterest to the ion counts derived from known amounts of an internalstandard for a given metabolite and by comparing the normalized ioncount to a calibration curve established with known amounts of puremetabolite and internal standards. Internal standards typically arestable-isotope labeled forms of the pure metabolite or pure forms of astructural analogue of the metabolite. Alternatively, relativequantification of a given metabolite in arbitrary units may becalculated by normalization to a selected internal reference value(e.g., the median value for metabolite levels on all samples run from agiven group).

In some embodiments, one or more metabolites are measured byimmunoassay. Numerous specific immunoassay formats and variationsthereof may be utilized for measurement of metabolites. (See, forexample, E. Maggio, Enzyme-Immunoassay, (1980) (CRC Press, Inc., BocaRaton, Fla.); see also U.S. Pat. No. 4,727,022 “Methods for ModulatingLigand-Receptor Interactions and their Application”; U.S. Pat. No.4,659,678 “Immunoassay of Antigens”; U.S. Pat. No. 4,376,110,“Immunometric Assays Using Monoclonal Antibodies,”; U.S. Pat. No.4,275,149, “Macromolecular Environment Control in Specific ReceptorAssays,”; U.S. Pat. No. 4,233,402, “Reagents and Method EmployingChanneling,” and U.S. Pat. No. 4,230,767, “Heterogenous Specific BindingAssay Employing a Coenzyme as Label.”). Antibodies can be conjugated toa solid support suitable for a diagnostic assay (e.g., beads such asprotein A or protein G agarose, microspheres, plates, slides or wellsformed from materials such as latex or polystyrene) in accordance withknown techniques, such as passive binding. Antibodies as describedherein may likewise be conjugated to detectable labels or groups such asradio labels (e.g., ³⁵S, ¹²⁵I, ¹³¹I), enzyme labels (e.g., horseradishperoxidase, alkaline phosphatase), and fluorescent labels (e.g.,fluorescein, Alexa, green fluorescent protein) in accordance with knowntechniques.

Determination of ASD Risk

In some embodiments, methods of the present invention allow one of skillin the art to identify, diagnose, or otherwise assess subjects based atleast in part on measuring metabolite levels in samples obtained fromsubjects who may not presently exhibit signs or symptoms of ASD and/orother developmental disorders, but who nonetheless may be at risk forhaving or developing ASD and/or other developmental disorders.

In certain embodiments, levels of metabolites, or other analytes (e.g.,proteomic or genomic information) can be measured in a test sample andcompared to normal control levels, or to levels in subjects having adevelopmental disorder, condition, or classification that is not ASD(e.g., non-ASD developmental delay, DD). In some embodiments, the term“normal control level” refers to the level of one or more metabolites,or other analytes, or indices, typically found in subjects not sufferingfrom ASD or not likely to have ASD or other developmental disorder. Insome embodiments, a normal control level is a range or an index. In someembodiments, a normal control level is determined from a database ofpreviously tested subjects. A difference in the level of one or moremetabolites, or other analytes, compared to a normal control level canindicate that a subject has ASD or is at risk of developing ASD.Conversely, a lack of difference in the level of one or more metabolitescompared to a normal control level of one or more metabolites, or otheranalytes, can indicate that the subject does not have ASD, or is at lowrisk of developing ASD.

In some embodiments, a reference value is that which has been obtainedfrom a control subject or population whose diagnosis is known (i.e., hasbeen diagnosed with or identified as suffering from ASD, or has not beendiagnosed with or identified as suffering from ASD). In someembodiments, a reference value is an index value or baseline value, suchas, for example, a “normal control level” as described herein. In someembodiments, a reference sample or index value or baseline value istaken or derived from one or more subjects who have been exposed totreatment for ASD, or may be taken or derived from one or more subjectswho are at low risk of developing ASD, or may be taken or derived fromsubjects who have shown improvements in ASD risk factors as a result ofexposure to treatment. In some embodiments, a reference sample or indexvalue or baseline value is taken or derived from one or more subjectswho have not been exposed to a treatment for ASD. In some embodiments,samples are collected from subjects who have received initial treatmentfor ASD and/or subsequent treatment for ASD to monitor the progress ofthe treatment. In some embodiments, a reference value has been derivedfrom risk prediction algorithms or computed indices from populationstudies of ASD. In some embodiments, a reference value is from subjectsor populations that have a disease or disorder other than ASD, such asanother developmental disorder, e.g., non-ASD Developmental Delay (DD).

In some embodiments, differences in the level of metabolites measured bythe methods of the present invention comprise increases or decreases inthe level of the metabolites as compared to a normal control level,reference value, index value, or baseline value. In some embodiments,increases or decreases in levels of metabolites relative to a referencevalue from a normal control population, a general population, or from apopulation with another disease, is indicative of presence of ASD,progression of ASD, exacerbation of ASD or amelioration of ASD or ASDsymptoms. In some embodiments, increases or decreases in levels ofmetabolites relative to a reference value from a normal controlpopulation, a general population, or from a population with anotherdisease, is indicative of an increase or decrease in the risk ofdeveloping ASD, or complications relating thereto. The increase ordecrease can be indicative of the success of one or more treatmentregimens for ASD, or can indicate improvements or regression of ASD riskfactors. The increase or decrease can be, for example, at least 5%, atleast 10%, at least 15%, at least 20%, at least 25%, at least 30%, atleast 35%, at least 40%, at least 45%, or at least 50% of a referencevalue.

In some embodiments, differences in the level of metabolites asdescribed herein are statistically significant differences.“Statistically significant” refers to differences that are greater thanwhat might be expected to happen by chance alone. Statisticalsignificance can be determined by any method known in the art. Forexample, statistical significance can be determined by p-value. Thep-value is a measure of probability that a difference between groupsduring an experiment happened by chance. For example, a p-value of 0.01means that there is a 1 in 100 chance the result occurred by chance. Thelower the p-value, the more likely it is that a measured differencebetween groups is not by chance. A difference is considered to bestatistically significant if the p-value is at or below 0.05. In someembodiments, a statistically significant p-value is at or below 0.04,0.03, 0.02, 0.01, 0.005, or 0.001. In some embodiments, a statisticallysignificant p-value is at or below 0.30, 0.25, 0.20, 0.15, or 0.10(e.g., in the case of identifying whether a single particular metabolitehas additive predictive value when used in a classifier including othermetabolites). In some embodiments, a p value is determined by t-test. Insome embodiments, a p value is obtained by Fisher's test. In someembodiments statistical significance is achieved by analysis ofcombinations of several metabolites in panels and combined withmathematical algorithms to achieve a statistically significant riskprediction.

A classification test, assay, or method has an associated ROC curve(Receiver Operating Characteristic curve) that plots false positive rate(1-specificity) against true positive rate (sensitivity). The area underthe ROC curve (AUC) is a measure of how well the classifier candistinguish between two diagnostic groups. The maximum AUC is 1.0 (aperfect test) and the minimum area is 0.5 (e.g. the area where there isno discrimination of normal versus disease). It is appreciated that asan AUC approaches one, the accuracy of a test increases.

In some embodiments, a high degree of risk prediction accuracy is a testor assay wherein the AUC is at least 0.60. In some embodiments, a highdegree of risk prediction accuracy is a test or assay wherein the AUC atleast 0.65, at least 0.70, at least 0.75, at least 0.80, at least 0.85,at least 0.90, or at least 0.95.

Predicting ASD Risk by Assessment of Tail Effects

In some embodiments, a mean difference of metabolite levels is assessedamong or between populations, e.g., between an ASD population and a DDpopulation, or compared to a normal control population. In someembodiments, metabolites from samples of a given population (i.e., ASD)are assessed for enrichment in a tail of a distribution curve. That is,determining whether a greater proportion of samples from a designatedpopulation (e.g., ASD) as compared to a second population (e.g., DD)reside in a tail of the distribution curve (i.e., a “tail effect”). Insome embodiments, both mean differences and tail effects are identifiedand utilized. In some embodiments, a tail is determined by apredetermined threshold value. For example, a sample is designated to bewithin a tail if its measurement for a certain metabolite is higher thanthe value corresponding to a 90^(th) percentile in a population for thatmetabolite (right tail, or upper tail), or is lower than the valuecorresponding to a 15^(th) percentile (left tail, or lower tail). Insome embodiments, the threshold for a right (upper) tail for a givenmetabolite is the value corresponding to the 80^(th), 81^(st), 82^(nd),83^(rd), 84^(th), 85^(th), 86^(th), 87^(th), 88^(th), 89^(th), 90^(th),91^(st), 92^(nd), 93^(rd), 94^(th), 95^(th), 96^(th), 97^(th), 98^(th),or 99^(th) percentile (e.g., where a sample is designated to be within aright tail if its measurement for the given metabolite is higher thanthe value associated with this percentile). In some embodiments, thethreshold for a left (lower) tail for a given metabolite is the valuecorresponding to the 25^(th), 24^(th), 23^(th), 22^(nd), 21^(st),20^(th), 19^(th), 18^(th), 17^(th), 16^(th), 15^(th), 14^(th), 13^(th),12^(th), 11^(th), 10^(th), 9^(th), 8^(th), 7^(th), 6^(th), 5^(th),4^(th), 3^(rd), 2^(nd), or 1^(st) percentile (e.g., where a sample isdesignated to be within a left tail if its measurement for the givenmetabolite is lower than the value associated with this percentile).Percentile values shown are inclusive of fractional values.

In some embodiments, a distribution curve is generated from a plot ofmetabolite levels for one or more populations. In some embodiments, adistribution curve is generated from a single reference population,e.g., a general population. In some embodiments, distribution curves aregenerated from two populations, e.g., an ASD population and a non-ASDpopulation, such as DD. In some embodiments, distribution curves aregenerated from three or more populations, e.g., an ASD population, anon-ASD population but with another developmentaldisorder/condition/classification such as DD, and a healthy (e.g., nodevelopmental disorder) control population. Metabolite distributioncurves from each of the populations may be utilized to make more thanone risk assessment (e.g. diagnosing ASD, diagnosing DD, differentiatingbetween ASD and DD). The methods for assessment of utilizing taileffects described herein may be applied to more than two populations.

In some embodiments, a plurality of metabolites and their distributionsare used for risk assessment. In some embodiments, levels of two or moremetabolites are utilized to predict ASD risk. In some embodiments, atleast two of the metabolites are selected from the metabolites listed inTable 1. In some embodiments, at least three of the metabolites areselected from the metabolites listed in Table 1. In some embodiments, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 21metabolites selected from the metabolites listed in Table 1 are used topredict ASD risk.

Further discussion of Table 1 (Tables 1A through 1C) appears in theExamples section below.

TABLE 1A Exemplary 21-metabolite panel with tail effects predictive ofASD vs. DD Metabolite 3-(3-hydroxyphenyl)propionate3-carboxy-4-methyl-5-propyl-2- furanpropanoate (CMPF) 3-indoxyl sulfate4-ethylphenyl sulfate 5-hydroxyindoleacetate 8-hydroxyoctanoategamma-CEHC hydroxyisovaleroylcarnitine (C5) indoleacetateisovalerylglycine lactate N1-Methyl-2-pyridone-5-carboxamide p-cresolsulfate pantothenate (Vitamin B5) phenylacetylglutamine pipecolatexanthine hydroxy-chlorothalonil octenoylcarnitine 3-hydroxyhippurate1,5-anhydroglucitol (1,5-AG)

TABLE 1B Exemplary metabolites with tail enrichment predictive of ASDOdds Confidence Metabolite Tail Effect Ratio Interval (90%)3-carboxy-4-methyl-5- Left; p = 0.23 1.61 1.19-3.65 propyl-2-furanpro-panoate (CMPF) 3-indoxyl sulfate Left; p = 0.01 3.03 1.91-6.124-ethylphenyl sulfate Left; p = 0.02 2.54 1.70-5.375-hydroxyindoleacetate Right; p < 0.01 4.91  2.22-15.358-hydroxyoctanoate Left; p = 0.01 3.03 1.64-5.34 gamma-CEHC Left; p =0.01 3.03 2.08-8.09 hydroxyisovaleroyl- Left; p = 0.23 1.61 1.01-2.73carnitine (C5) indoleacetate Left; p = 0.06 2.16 1.40-4.17isovalerylglycine Left; p = 0.12 1.86 1.09-3.14 lactate Right; p = 0.062.64 1.23-4.64 N1-Methyl-2-pyridone-5- Left; p = 0.23 1.61 0.98-2.73carboxamide p-cresol sulfate Left; p < 0.01 3.69 1.94-6.68 pantothenate(Vitamin B5) Right; p = 0.06 2.64 1.58-7.04 phenylacetylglutamine Left;p = 0.06 2.16 1.38-4.03 pipecolate Right; p < 0.01 4.91  1.79-15.32xanthine Right; p = 0.15 2.08 1.25-4.92 hydroxy-chlorothalonil Right; p< 0.01 4.94  2.77-17.71 octenoylcarnitine Left; p = 0.01 3.03 1.84-7.311,5-anhydroglucitol (1,5-AG) Left; p = 0.01 3.03 1.76-6.44

TABLE 1C Exemplary metabolites with tail enrichment predictive of DDOdds Confidence Metabolite Tail Effect Ratio Interval (90%)3-(3-hydroxyphenyl)propionate Left; p < 0.01 0.36 0.24-0.62 3-indoxylsulfate Right; p = 0.1 0.52 0.32-0.91 isovalerylglycine Right; p = 0.010.33 0.19-0.66 p-cresol sulfate Right; p < 0.01 0.28 0.17-0.50phenylacetylglutamine Right; p < 0.01 0.20 0.15-0.46 pipecolate Left; p= 0.30 0.69 0.40-0.95 xanthine Left; p = 0.01 0.40 0.28-0.703-hydroxyhippurate Left; p = 0.02 0.45 0.29-0.71

In some embodiments, at least two metabolites for analysis are selectedfrom the group consisting of phenylacetylglutamine, xanthine,octenoylcarnitine, p-cresol sulfate, isovalerylglycine, gamma-CEHC,indoleacetate, pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate (VitaminB5), hydroxy-chlorothalonil, and combinations thereof.

In some embodiments, at least three metabolites for analysis areselected from the group consisting of phenylacetylglutamine, xanthine,octenoylcarnitine, p-cresol sulfate, isovalerylglycine, gamma-CEHC,indoleacetate, pipecolate, 1,5-anhydroglucitol (1,5-AG), lactate,3-(3-hydroxyphenyl)propionate, 3-indoxyl sulfate, pantothenate (VitaminB5), hydroxy-chlorothalonil, and combinations thereof.

In some embodiments, information on the lack of a tail effect for aparticular set of metabolites is used for risk assessment. In someembodiments, a lack of tail effects is determined to provide a nullresult (i.e., no information as opposed to negative information). Insome embodiments, a lack of tail effects is determined to be indicativeof one classification over another (e.g., more indicative of DD overASD).

In some embodiments, the distribution curve is asymmetrical, ornon-Gaussian. In some embodiments, the distribution curve does notfollow a parametric distribution pattern.

In some embodiments, information from mean differences (e.g., meanshifts) is combined with tail effect information for risk assessment. Insome embodiments, information from mean differences is used for riskassessment without use of tail effect information.

In some embodiments, analysis of metabolites is combined with othertypes of information, e.g., genetic information, demographicinformation, and/or behavior assessment to determine a subject's riskfor ASD or other disorders.

In some embodiments, ASD risk-assessment is performed based at least inpart on measured amounts of certain metabolites in a biological sample(e.g., blood, plasma, urine, saliva, stool) obtained from a subject,where the certain metabolites are found herein to exhibit “taileffects.” It has been found by the inventors that there is notnecessarily a statistically significant mean shift between twopopulations associated with a tail effect. Thus, a tail effect is aspecific phenomenon distinct from mean shift.

In certain embodiments, a particular metabolite exhibits a right taileffect indicative of ASD over a non-ASD population (e.g., a DDpopulation) when the metabolite is characterized as follows:

-   -   a non-ASD population distribution curve is established for the        metabolite in a non-ASD population (e.g., a DD population) with        x-axis indicative of the level of the first metabolite and        y-axis indicative of corresponding population;    -   an ASD population distribution curve is established for the        metabolite in an ASD population with x-axis indicative of the        level of the first metabolite and y-axis indicative of        corresponding population; and    -   the non-ASD population distribution curve and the ASD population        distribution curve are characterized in that one or both of (A)        and (B) hold(s):        -   (A) the ratio of (i) area under the ASD population            distribution curve for x>level n of the metabolite to (ii)            area under the non-ASD population distribution curve for            x>level n of the metabolite is greater than 150%            (e.g., >200%, >300%, >500%, >1000%, etc.), thereby providing            predictive utility for differentiating between an ASD            classification and a non-ASD classification for samples            having >level n of the metabolite, and        -   (B) where n′ is the minimum threshold metabolite level            corresponding to the top decile (or, any cutoff from about            5% to about 20%) of combined non-ASD and ASD populations            used to create the distribution curves, then for an unknown            sample (e.g. a random sample selected from a population            having an equal number of ASD and non-ASD members) having a            metabolite level of at least n′, the odds of the sample            being ASD as opposed to non-ASD are no less than 1.6:1            (e.g., no less than 2:1, no less than 3:1, no less than 4:1,            no less than 5:1, no less than 6:1, no less than 7:1, no            less than 8:1, no less than 9:1, or no less than 10:1)            (e.g., where p<0.3, p<0.2, p<0.1, p<0.05, p<0.03, or p<0.01,            e.g., statistically significant classification), thereby            providing predictive utility for differentiating between an            ASD classification and a non-ASD classification for samples            having >level n′ of the metabolite.

In certain embodiments, a particular metabolite exhibits a left taileffect indicative of ASD over a non-ASD population (e.g., a DDpopulation) when the metabolite is characterized as follows:

-   -   a non-ASD population distribution curve is established for the        metabolite in a non-ASD population (e.g., a DD population) with        x-axis indicative of the level of the first metabolite and        y-axis indicative of corresponding population;    -   an ASD population distribution curve is established for the        metabolite in an ASD population with x-axis indicative of the        level of the first metabolite and y-axis indicative of        corresponding population; and    -   the non-ASD population distribution curve and the ASD population        distribution curve are characterized in that one or both of (A)        and (B) hold(s):        -   (A) the ratio of (i) area under the ASD population            distribution curve for x<level m of the metabolite to (ii)            area under the non-ASD population distribution curve for            x<level m of the metabolite is greater than 150%            (e.g., >200%, >300%, >500%, >1000%, etc.), thereby providing            predictive utility for differentiating between an ASD            classification and a non-ASD classification for samples            having <level m of the metabolite, and        -   (B) where m′ is the maximum threshold metabolite level            corresponding to the bottom decile (or, any cutoff from            about 5% to about 20%) of combined non-ASD and ASD            populations used to create the distribution curves, then for            an unknown sample (e.g. a random sample selected from a            population having an equal number of ASD and non-ASD            members) having a metabolite level of less than m′, the odds            of the sample being ASD as opposed to non-ASD are no less            than 1.6:1 (e.g., no less than 2:1, no less than 3:1, no            less than 4:1, no less than 5:1, no less than 6:1, no less            than 7:1, no less than 8:1, no less than 9:1, or no less            than 10:1) (e.g., where p<0.3, p<0.2, p<0.1, p<0.05, p<0.03,            or p<0.01, e.g., statistically significant classification),            thereby providing predictive utility for differentiating            between an ASD classification and a non-ASD classification            for samples having <level m′ of the metabolite.

In certain embodiments, a particular metabolite exhibits a right taileffect indicative of non-ASD (e.g., DD) over an ASD population when themetabolite is characterized as follows:

-   -   a non-ASD population distribution curve is established for the        metabolite in a non-ASD population (e.g., a DD population) with        x-axis indicative of the level of the first metabolite and        y-axis indicative of corresponding population;    -   an ASD population distribution curve is established for the        metabolite in an ASD population with x-axis indicative of the        level of the first metabolite and y-axis indicative of        corresponding population; and    -   the non-ASD population distribution curve and the ASD population        distribution curve are characterized in that one or both of (A)        and (B) hold(s):        -   (A) the ratio of (i) area under the non-ASD population            distribution curve for x>level n of the metabolite to (ii)            area under the ASD population distribution curve for x>level            n of the metabolite is greater than 150%            (e.g., >200%, >300%, >500%, >1000%, etc.), thereby providing            predictive utility for differentiating between a non-ASD            classification and an ASD classification for samples            having >level n of the metabolite, and        -   (B) where n′ is the minimum threshold metabolite level            corresponding to the top decile (or, any cutoff from about            5% to about 20%) of combined non-ASD and ASD populations            used to create the distribution curves, then for an unknown            sample (e.g. a random sample selected from a population            having an equal number of ASD and non-ASD members) having a            metabolite level of greater than n′, the odds of the sample            being non-ASD as opposed to ASD are no less than 1.6:1            (e.g., no less than 2:1, no less than 3:1, no less than 4:1,            no less than 5:1, no less than 6:1, no less than 7:1, no            less than 8:1, no less than 9:1, or no less than 10:1)            (e.g., where p<0.3, p<0.2, p<0.1, p<0.05, p<0.03, or p<0.01,            e.g., statistically significant classification), thereby            providing predictive utility for differentiating between a            non-ASD classification and an ASD classification for samples            having >level n′ of the metabolite.

In certain embodiments, a particular metabolite exhibits a left taileffect indicative of non-ASD (e.g., DD) over an ASD population when themetabolite is characterized as follows:

-   -   a non-ASD population distribution curve is established for the        metabolite in a non-ASD population (e.g., a DD population) with        x-axis indicative of the level of the first metabolite and        y-axis indicative of corresponding population;    -   an ASD population distribution curve is established for the        metabolite in an ASD population with x-axis indicative of the        level of the first metabolite and y-axis indicative of        corresponding population; and    -   the non-ASD population distribution curve and the ASD population        distribution curve are characterized in that one or both of (A)        and (B) hold(s):        -   (A) the ratio of (i) area under the non-ASD population            distribution curve for x<level m of the metabolite to (ii)            area under the ASD population distribution curve for x<level            m of the metabolite is greater than 150%            (e.g., >200%, >300%, >500%, >1000%, etc.), thereby providing            predictive utility for differentiating between a non-ASD            classification and an ASD classification for samples having            <level m of the metabolite, and        -   (B) where m′ is the maximum threshold metabolite level            corresponding to the bottom decile (or, any cutoff from            about 5% to about 20%) of combined non-ASD and ASD            populations used to create the distribution curves, then for            an unknown sample (e.g. a random sample selected from a            population having an equal number of ASD and non-ASD            members) having a metabolite level of less than m′, the odds            of the sample being non-ASD as opposed to ASD are no less            than 1.6:1 (e.g., no less than 2:1, no less than 3:1, no            less than 4:1, no less than 5:1, no less than 6:1, no less            than 7:1, no less than 8:1, no less than 9:1, or no less            than 10:1) (e.g., where p<0.3, p<0.2, p<0.1, p<0.05, p<0.03,            or p<0.01, e.g., statistically significant classification),            thereby providing predictive utility for differentiating            between a non-ASD classification and an ASD classification            for samples having <level m′ of the metabolite.

In certain embodiments, a risk assessment is performed using a pluralityof metabolites that exhibit tail effects. It has been observed that, forassessment of ASD, there are particular groups of metabolites (e.g., twoor more metabolites) which provide complementary diagnostic/riskassessment information. For example, ASD-positive individuals who areidentifiable by analysis of the level of a first metabolite (e.g.,individuals within an identified tail of the first metabolite) are notthe same ASD-positive individuals who are identifiable by analysis of asecond metabolite (or there may be a low, non-zero degree of overlap).The tail of a first metabolite is predictive of certain ASD individuals,while the tail of the second metabolite is predictive of other ASDindividuals. Without wishing to be bound to a particular theory, thisdiscovery may be reflective of the multi-faceted nature of ASD, itself.

Thus, in certain embodiments, the risk assessment method includesidentifying whether a subject falls within any of a multiplicity ofidentified metabolite tails involving a plurality of metabolites, e.g.,where the predictors of the different metabolite tails are at leastpartially disjoint, e.g., they have low mutual information, such thatrisk prediction improves as multiple metabolites are incorporated withlow mutual information.

EXAMPLES Subjects

Blood samples were collected from subjects between the ages of 18 and 60months who were referred to nineteen developmental evaluation centersfor evaluation of a possible developmental disorder other than isolatedmotor problems. Informed consent was obtained for all subjects. Subjectswith a prior diagnosis of ASD from a clinic specialized in pediatricdevelopment evaluation or who were unable or unwilling to complete studyprocedures were excluded from the study.

The subjects are those who enrolled in the SynapDx Autism SpectrumDisorder Gene Expression Analysis (STORY) study. The STORY study wasperformed in accordance with current ICH guidelines on Good ClinicalPractice (GCP), and applicable regulatory requirements. GCP is aninternational ethical and scientific quality standard for designing,conducting, recording, and reporting studies that involve theparticipation of human subjects. Compliance with this standard providespublic assurance that the rights, safety, and wellbeing of studysubjects are protected, consistent with the principles that haveoriginated in the Declaration of Helsinki and that the clinical studydata are credible.

Results shown in FIGS. 1 to 12 are based on 180 blood samples from malesin the STORY study. The sample set included 122 ASD samples, and 58 DD(non-ASD) samples. ASD diagnosis followed DSM-V diagnostic criteria.Additional results are based on a broader set of 299 blood samples frommale subjects in the STORY study. The broader sample set included 198ASD samples and 101 DD samples.

For all tests, approximately 3 mL blood samples were collected in EDTAtubes, and plasma was prepared by centrifuging the tubes. The plasma wasthen frozen and shipped to a laboratory for analysis. At the laboratory,methanol extraction of the samples was conducted, and the extracts wereanalyzed by an optimized ultrahigh performance liquid or gaschromatography/tandem mass spectrometry (UHPLC/MS/MS or GC/MS/MS) method(See, for example, Anal. Chem., 2009, 81, 6656-6667).

Data Analysis

Metabolites in blood samples were quantified for both male and femalesubjects. Samples were assayed for levels of metabolites and quantifiedas a concentration in arbitrary units normalized to a medianconcentration for all samples measured on a given day. For example, aunit of greater than 1 refers to a quantity of metabolite that isgreater than the median of samples for the day, and a unit of less than1 refers to a quantity that is less than the median. A cross-validationwas then carried out, where samples were randomly divided intonon-overlapping training/testing sets on which the unbiased performanceof machine learning classifiers was evaluated. Twenty-one metaboliteshave been identified that are highly informative individually andcollectively for predicting ASD, particularly in male subjects.

Example 1 Discerning Metabolite Level Information

This example shows that valuable information for risk assessment for ASDcan be discerned from identification and analysis of tail effects in asample distribution that would otherwise be missed by traditionalanalyses (e.g., mean shift-based analysis).

Once a metabolite level is determined, there are multiple ways toimplement the information for risk assessment, including mean shifts andtail effects. Singularly, mean shifts were found to provide some, butnot optimal, predictive information. An exemplary mean shift is shown inFIG. 1. In this figure, the ASD distribution shifts to the right of thenon-ASD distribution (DD).

In addition to traditional mean shift analysis, the inventors discernedadditional information from the samples. Metabolite distribution curveswere plotted for ASD and non-ASD (here, DD) samples, and it wasdiscovered that for a subset of metabolites measured, samples fromeither the ASD or the DD population were enriched in a right (upper) orleft (lower) tail (i.e., a tail effect). A representative tail effect isshown in FIG. 2. Notably, the two distributions shared nearly identicalmean values (i.e., there was minimal or no mean shift). Thus, thepredictive value of the metabolite would not be discernible fromtraditional analysis of mean shifts.

Metabolites may exhibit a right (upper) tail effect, or a left (lower)tail effect, or both. ASD and non-ASD (here, DD) distribution curves fora representative metabolite, 5-HIAA are shown in FIG. 3. A clear righttail effect is observed, e.g., the ASD distribution has a larger AUC onthe right tail. Thus, it is demonstrated that samples with high levelsof this metabolite are highly enriched with ASD-population members. Withthis metabolite, both the mean shift (indicated by t-test value) and theright tail (indicated by ‘extremes’ Fisher test value) are statisticallysignificant.

ASD and non-ASD (here, DD) distribution curves for another illustrativemetabolite, gamma-CEHC, are shown in FIG. 4. A clear left tail effect isobserved, e.g., the ASD distribution has a larger AUC on the left tail.Thus, it is demonstrated that samples with low levels of this metaboliteare highly enriched with ASD-population members. With this metabolite,the mean shift (indicated by t-test value) is not statisticallysignificant, while the left tail is statistically significant.

These data illustrate that identification and analysis of tail effectsprovides additional information for risk assessment that cannot beobtained via traditional mean shift analysis.

Example 2 Strong Prediction of ASD from Selected MetabolitesDemonstrating Tail Effects

This example illustrates the assessment of tail effects for predictionof ASD. The inventors identified statistically significant tail effectsfor a number of metabolites in samples obtained from male subjects. Thetail effects were singly and cumulatively informative about whichpopulation the subject belonged to—i.e., the ASD population or the DDpopulation. Table 1 shows an exemplary panel of twenty-one metabolitesexhibiting ASD vs. DD tail effects with high predictive power.

Table 1B shows metabolites of the 21-metabolite panel that have taileffects predictive of ASD. The statistical significance (p-value) ofeach tail effect as well as its location on a distribution curve (i.e.,left tail effect or right tail effect) is indicated. An odds ratio ofgreater than one indicates predictive power for ASD. For example, 5HIAAhas a right tail with an odds ratio of 4.91, indicating that in theSTORY study data set (in which the ratio of ASD to DD samples was 2:1),approximately 10 ASD samples for every DD sample was in the right tail.The confidence intervals were estimated by bootstrap methods. Onethousand individual bootstraps were generated from the STORY data byresampling with replacement. For each bootstrap, the position of thetail and corresponding odds ratio was determined. The 90% confidenceinterval was calculated from the distribution of observed odds ratios.

Based on these criteria, nineteen metabolites of the 21-metabolite panelwere found to be predictive of ASD.

Table 1C shows metabolites having tail effects that are predictive ofDD. The statistical significance (p-value) of each tail effect as wellas its location on a distribution curve (i.e., left tail effect or righttail effect) is indicated. An odds ratio of less than one indicatespredictive power for DD. Based on these criteria, eight metabolites ofthe 21-metabolite panel were found to be predictive of DD. The oddsratio and 90% confidence intervals were determined similarly for ASD,taking into account the 1:2 ratio of DD to ASD samples in the STORYstudy.

Notably, certain metabolites demonstrate a single tail effect (eitherleft or right) with predictive power for either ASD or DD, whereas othermetabolites demonstrate both a left and right tail effect, togetherproviding predictive power for both ASD and DD. For example,phenylacetylglutamine and p-cresol sulfate demonstrate both right andleft tail effects.

The tail effects of the 21 metabolites listed in Table 1 are shownindividually in the graphs of FIGS. 13A to 13U. For each graph,distributions of one metabolite in both the ASD and DD populations areshown. The legend at the top of each panel shows the statisticalsignificance of the left and right tails for the metabolite (p-valuegenerated by Fisher's test).

Some metabolites, e.g., phenylacetylglutamine, exhibit mean shifts andtail effects. As shown in FIG. 5, phenylacetylglutamine exhibits astatistically significant mean shift (t-test; p=0.001), andstatistically significant left and right tail effects between the twopopulations (‘extremes’ signifies tail effect, Fisher's test; p=0.0001).The distributions appear as shifted Gaussian curves between the ASD andDD populations.

Table 2 shows threshold values used to determine the tail effects forthe 21-metabolite panel, based on the underlying population distributionof each metabolite in the ASD and non-ASD populations. Illustratively,the upper threshold value corresponds to the 90^(th) percentiledistribution, while the lower threshold value corresponds to the 15^(th)percentile distribution. The absolute measurements of the thresholdvalues (e.g., ng/mL, nM, etc.) can be calculated by using values inTable 2 with average concentrations of the metabolites in a population.

TABLE 2 Threshold levels for left tail (at or below 15^(th) percentile)and right tail (at or above 90^(th) percentile) of metabolitedistribution curve Left tail Right tail Metabolite cut-off cut-off1,5-anhydroglucitol (1,5-AG) 0.680 1.561 3-(3-hydroxyphenyl)propionate0.270 3.462 3-carboxy-4-methyl-5-propyl-2- 0.396 13.734 furanpropanoate(CMPF) 3-indoxyl sulfate 0.584 1.601 4-ethylphenyl sulfate 0.281 4.0545-hydroxyindoleacetate 0.729 2.027 8-hydroxyoctanoate 0.711 1.411gamma-CEHC 0.505 2.199 hydroxyisovaleroylcarnitine (C5) 0.619 1.767indoleacetate 0.707 1.690 isovalerylglycine 0.438 3.182 lactate 0.8011.288 N1-Methyl-2-pyridone-5-carboxamide 0.554 2.254 p-cresol sulfate0.378 2.231 pantothenate (Vitamin B5) 0.675 1.980 phenylacetylglutamine0.498 2.305 pipecolate 0.651 1.711 xanthine 0.731 1.507hydroxy-chlorothalonil 0.597 2.645 octenoylcarnitine 0.479 2.2143-hydroxyhippurate 0.375 3.651

Example 3 Predicting ASD with Multiple Metabolites

The information provided by multiple metabolites (e.g., those listed inTable 1) can be used individually or as a group to assist in diseaserisk prediction. Particularly informative sets of metabolites includemembers that do not correlate to each other well and have lowcollinearity (i.e. low mutuality). For example, FIG. 6 shows 5HIAAlevels compared against gamma-CEHC levels demonstrating a lack ofcorrelation between informative levels of the two metabolites. Forexample, the ASD individuals identified in the tail of 5HIAA (FIG. 3)are generally not the same ASD individuals identified in the tail ofgamma-CEHC. Thus, the metabolites 5HIAA and gamma-CEHC are deemed toprovide complementary information. Tail enriched metabolites with lowmutuality provide complementary classification information.

FIG. 7 is a chart indicating, for each of the 180 samples, whether thesample was within a tail or not within a tail of each of the metabolitesof a 12-metabolite panel. In this exemplary panel, tails for twometabolites, xanthine and P-cresol sulfate, are predictive of non-ASD(e.g., DD), while tails for the other ten metabolites are predictive ofASD.

When multiple metabolites are assessed, the number of combinations ofthe aggregated tail effect counts increase, as well as the potentialaggregated tail effect count. The distribution of aggregated tail effectcounts from ASD and from non-ASD populations can be plotted and theresulting distribution can be used to determine suitable separationbetween ASD and non-ASD when an unknown sample is measured. As shown inFIG. 8A, ASD and non-ASD (here, DD) samples can be further analyzed byemploying a voting (e.g., binning) scheme to further utilize thecomplementary information provided by the metabolites for which a taileffect was observed. Data for a total of 12 metabolites are shown. Inone particular scheme, for a given sample, the number of metabolites forwhich the sample fell within an ASD-predictive tail was summed, as wasthe number of metabolites for which the sample fell within a non-ASD(here, DD)-predictive tail. These two values are shown plotted as x- andy-coordinates (FIG. 8A). Notably, as the number of ASD enrichedmetabolites increase (higher in y-axis) and as the number of non-ASDenriched metabolites decrease (lower in x-axis), there appeared to beless mixing of non-ASD dots among ASD dots, e.g., suggesting a lowerlikelihood for a false positive diagnosis for ASD. On the other hand, asthe number of ASD enriched metabolites decreased (lower in y-axis) andas the number of non-ASD enriched metabolites increased (higher inx-axis), there was less mixing of ASD dots among non-ASD dots, e.g.,suggesting a lower likelihood for a false positive diagnosis of DD.

The samples were divided into four different bins, shown in FIG. 8B. Thebins on the top and on the bottom right in particular showed clearseparation, facilitative of ASD or DD risk evaluation.

Of the four bins shown in FIG. 8, the bin most strongly predictive ofASD included samples having 2 or more ASD-enriched features and either 0or 1 non-ASD enriched features. The bin having 1 ASD-enriched featureand either 0 or 1 non-ASD enriched features was also predictive of ASD,though less strongly than the bin above. The bin having 1 or morenon-ASD enriched features and 0 ASD-enriched feature was stronglypredictive of non-ASD. A bin of samples having no ASD-enriched featuresand no non-ASD-enriched features may also provide predictive informationin some circumstances.

In one exemplary voting scheme, votes are tallied for a given sample,for example, with ASD-enriched metabolites scoring a point andnon-ASD-enriched metabolites subtracting a point. A sample with apositive result (e.g., equal to or greater than 1) may be considered ASD(or having significant risk of ASD), a sample with a negative result(equal to or less than −1) may be considered non-ASD (or having asignificant likelihood of non-ASD). A sample with a zero result may beconsidered likely non-ASD or ASD, depending on the distribution of ASDto non-ASD in the samples, or may be returned as an indeterminate or “noclassification result” sample. Similarly, FIG. 8C shows vote tallyingresults for the 21-metabolite panel described in Table 1.

In another exemplary scoring system (shown in FIG. 21), the log 2 valueof the odds ratio (log 2 OR) for ASD and DD features are summed for eachmetabolite to calculate a risk score for ASD or DD.

Tail effect information may be used to differentiate a subject havingASD or a non-ASD condition. Likewise, tail effect information may beused to predict the risk for another disease or condition, e.g., DD, fora subject.

For example, tail effect distribution for a non-ASD population, e.g, DD,as shown in FIGS. 8A and 8C, can be used to establish a reference valuefor the average tail effect sum for a given number of metabolites inthat population. This average value can be used as a reference tocompare to the sum of average tail effects from a sample from an unknownsubject, and can be used to assess the subject's risk for ASD withouthaving to obtain the population distribution curves of metabolites inboth ASD and non-ASD populations.

Tail effect information, e.g., as described in the above exemplaryvoting schemes, or similar schemes, may also be combined withtraditional mean-shift information and/or other classificationinformation for improved classification results.

It is demonstrated herein that the predictability of ASD risk can beincreased by analysis of combinations of certain metabolites. Forexample, FIGS. 9-11, and 13-14A-D illustrate how use of a voting schemecan increase AUC of the classifier and improve predictive ability. Useof subsets of a 12-metabolite panel increased ASD predictive power(y-axis) as the number of metabolites in the subsets increased (from 1to 12) (FIG. 9). Use of different classifiers (i.e., logisticregression, naive Bayes, or support vector machine (SVM)), and selectionof different featured also affect the AUC (FIG. 9). FIG. 10A shows forthe same population, using a 12-metabolite panel, the trichotomizedprediction of ASD risk using different features and classifiers, whileFIG. 10B shows the results using a 21-metabolite panel. FIGS. 11A and11B show the improvements in ASD risk prediction using voting schemes ofthe 12-metabolite panel (FIG. 11A) and the 21-metabolite panel (FIG.11B). Together, these analyses demonstrate that by selecting targetedmetabolites and using appropriate statistical tools, a high degree ofconfidence for ASD risk assessment can be achieved. For example, asshown in FIG. 12, an AUC of at least 0.74 was obtained following themethods described above using 12 metabolites.

Example 4 Selection of High Impact Metabolites from Metabolomics Data

Samples from ASD and DD subjects were screened for detection ofapproximately 600 known metabolites (shown in Table 3). From the initialset of 600, 84 candidate metabolites were identified to exhibit a taileffect. A subset of the 84 metabolites detected in the samples wereelucidated and are identified by name in Table 4. Metabolite panels(e.g., 12 and 21-panels) were selected from the set of 84 candidatemetabolites based on a high individual metabolite AUCs. Certaincandidate metabolites were excluded from panels based on factors such asan association with medication or age.

TABLE 3 Four hundred sixty five (465) elucidated metabolites of theinitial set of 600 metabolites assayed glycine N-acetylglycine sarcosine(N- Methylglycine) serine N-acetylserine threonine N-acetylalanineaspartate asparagine glutamine N-acetylglutamateN-acetyl-aspartyl-glutamate (NAAG) N-acetylhistidine 1-methylhistidine3-methylhistidine imidazole lactate lysine N6-acetyllysine glutarate(pentanedioate) glutaroylcarnitine (C5) 3-methylglutarylcarnitine-1phenylalanine N-acetylphenylalanine phenylpyruvate phenylacetylglutaminetyrosine N-acetyltyrosine phenol sulfate p-cresol sulfate o-cresolsulfate 3-methoxytyramine sulfate 3-(3-hydroxyphenyl)propionate3-phenylpropionate (hydrocinnamate) tryptophan N-acetyltryptophanindolelactate 3-indoxyl sulfate kynurenine kynurenateindoleacetylglutamine tryptophan betaine C-glycosyltryptophanN-acetylleucine 4-methyl-2-oxopentanoate isovalerate (C5)beta-hydroxyisovalerate hydroxyisovaleroylcarnitine (C5)alpha-hydroxyisovalerate 3-methyl-2-oxovalerate2-methylbutyroylcarnitine (C5) tiglyl carnitine (C5) valineN-acetylvaline 3-methyl-2-oxobutyrate 3-hydroxyisobutyratealpha-hydroxyisocaproate methionine S-adenosylhomocysteinealpha-ketobutyrate 2-aminobutyrate (SAH) S-methylcysteine taurinearginine proline citrulline homoarginine N-delta-acetylornithineN-methyl proline hydroxyproline creatinine acisoga 5-methylthioadenosine(MTA) 4-guanidinobutanoate glutathione, oxidized (GSSG) cys-gly,oxidized gamma-glutamylisoleucine gamma-glutamylleucinegamma-glutamylmethionine gamma-glutamyltyrosine gamma-glutamylvalineN-acetylcarnosine cyclo(gly-pro) cyclo(leu-pro) cyclo(L-phe-L-pro)isoleucylglutamine isoleucylglycine isoleucylvaline leucylglutamateleucylglycine leucylphenylalanine phenylalanylalaninephenylalanylarginine phenylalanylaspartate phenylalanylleucinephenylalanylmethionine phenylalanylphenylalanine pyroglutamylglycinepyroglutamylvaline serylleucine tryptophylphenylalanine valylglycinevalylleucine glucose 3-phosphoglycerate pyruvate ribitol xylonate xylosearabitol sucrose fructose mannitol glucuronate erythronatesuccinylcarnitine (C4) succinate fumarate valerate (5:0) caproate (6:0)heptanoate (7:0) caprate (10:0) laurate (12:0) 5-dodecenoate (12:1n7)2-hydroxyglutarate suberate (octanedioate) azelate (nonanedioate; C9)dodecanedioate (C12) tetradecanedioate (C14) hexadecanedioate (C16)3-carboxy-4-methyl-5-propyl- 2-aminoheptanoate 2-aminooctanoate2-furanpropanoate (CMPF) propionylcarnitine (C3) propionylglycine (C3)N-octanoylglycine hydroxybutyrylcarnitine valerylcarnitine (C5)hexanoylcarnitine (C6) cis-4-decenoyl carnitine laurylcarnitine (C12)myristoylcarnitine linoleoylcarnitine oleoylcarnitine (C18)deoxycarnitine 3-hydroxybutyrate (BHBA) alpha-hydroxycaproate2-hydroxyoctanoate 2-hydroxystearate 3-hydroxypropanoate3-hydroxyoctanoate 5-hydroxyhexanoate 8-hydroxyoctanoate16-hydroxypalmitate oleic ethanolamide palmitoyl ethanolamideN-oleoyltaurine myo-inositol scyllo-inositol choline 1-myristoyl-GPC(14:0) 2-myristoyl-GPC (14:0) 1- pentadecanoylglycerophos-phocholine(15:0) 1-palmitoleoyl-GPC (16:1) 2-palmitoleoyl-GPC (16:1)1-heptadecanoyl-GPC (17:0) 1-oleoyl-GPC (18:1) 2-oleoyl-GPC (18:1)1-linoleoyl-GPC (18:2) 1- 1-eicosadienoyl-GPC (20:2) 1-arachidoyl-GPC(20:0) nonadecanoylglycerophospho- choline(19:0) 2-eicosatrienoyl-GPC(20:3) 1-arachidonoyl-GPC (20:4) 2-arachidonoyl-GPC (20:4)1-docosapentaenoyl-GPC 1-docosahexaenoyl-GPC (22:6) 1- (22:5n6)palmitoylplasmenylethanol- amine 1-palmitoyl-GPE (16:0) 2-palmitoyl-GPE(16:0) 1-stearoyl-GPE (18:0) 2-oleoyl-GPE (18:1) 1-linoleoyl-GPE (18:2)2-linoleoyl-GPE (18:2) 1- 1- 1-palmitoyl-GPI (16:0)eicosatrienoylglycerophospho- docosahexaenoylglycerophospho-ethanolamine ethanolamine 1-linoleoyl-GPI (18:2) 1-arachidonoyl-GPI(20:4) 1- arachidonoylglyercophos- phate glycerol glycerol 3-phosphate(G3P) 1-myristoylglycerol (14:0) 1-oleoylglycerol (18:1)1-linoleoylglycerol (18:2) sphinganine lathosterol cholesterol7-beta-hydroxycholesterol 21-hydroxypregnenolone5alpha-pregnan-3beta,20beta-diol 5alpha-pregnan- disulfate monosulfate 13beta,20alpha-diol monsulfate 2 cortisol corticosterone cortisoneepiandrosterone sulfate androsterone sulfate 4-androsten-3alpha,17alpha-diol monosulfate 3 5alpha-androstan- cholate glycocholate3beta,17beta-diol disulfate taurochenodeoxycholatetauro-beta-muricholate deoxycholate ursodeoxycholateglycoursodeoxycholate tauroursodeoxycholate glycocholenate sulfatetaurocholenate sulfate 7-ketodeoxycholate xanthine xanthosine urate AMPadenosine 3′,5′-cyclic adenosine monophosphate (cAMP) N6-methyladenosineN6-carbamoylthreonyladenosine guanosine N2,N2-dimethylguanosine uridinepseudouridine 3-ureidopropionate beta-alanine N-acetyl-beta-alanine5,6-dihydrothymine 3-aminoisobutyrate nicotinamideN1-Methyl-2-pyridone-5- adenosine 5′-diphosphoribose riboflavin (VitaminB2) carboxamide (ADP-ribose) threonate arabonate alpha-tocopherolgamma-CEHC glucuronide heme bilirubin pyridoxate hippurate2-hydroxyhippurate (salicylurate) benzoate catechol sulfateO-methylcatechol sulfate 4-methylcatechol sulfate 4-ethylphenyl sulfate4-vinylphenol sulfate theobromine theophylline 1-methylurate7-methylxanthine 2-piperidinone levulinate (4-oxovalerate) gluconatecinnamoylglycine dihydroferulic acid methyl indole-3-acetateN-(2-furoyl)glycine piperine 4-allylphenol sulfate methylglucopyranoside (alpha + tartronate beta) (hydroxymalonate)6-oxopiperidine-2-carboxylic hydroquinone sulfate salicylate acidO-sulfo-L-tyrosine 2-aminophenol sulfate 2-ethylhexanoic acid EDTAglycerol 2-phosphate glycolate (hydroxyacetate) pyroglutamylglutaminebetaine phenylalanylglycine threonylleucine alaninephenylalanyltryptophan 1,5-anhydroglucitol (1,5-AG) glutamateserylphenyalanine glycerate histidine valylvaline threitol imidazolepropionate lactate mannose 2-aminoadipate arabinose alpha-ketoglutaratepipecolate sorbitol phosphate 4-hydroxyphenylacetate citrate pelargonate(9:0) 3-(4-hydroxyphenyl)lactate malate (HPLA) 17-methylstearate3-methoxytyrosine caprylate (8:0) undecanedioate 2-hydroxyphenylacetatemethylpalmitate (15 or 2) docosadioate indolepropionate sebacate(decanedioate) butyrylcarnitine (C4) 5-hydroxyindoleacetateoctadecanedioate (C18) acetylcarnitine (C2) leucine 2-methylmalonylcarnitine decanoylcarnitine (C10) isovalerylcarnitine (C5) N-palmitoylglycine stearoylcarnitine (C18) N-acetylisoleucine octanoylcarnitine(C8) acetoacetate 3-hydroxy-2-ethylpropionate palmitoylcarnitine (C16)2-hydroxypalmitate isobutyrylglycine (C4) carnitine 3-hydroxysebacateN-formylmethionine 2-hydroxydecanoate 12,13-DiHOME cysteine3-hydroxydecanoate N-palmitoyltaurine ornithine 13-HODE + 9-HODEethanolamine N-acetylarginine N-stearoyltaurine 2-palmitoyl-GPC (16:0)creatine glycerophosphorylcholine (GPC) 2-stearoyl-GPC (18:0)4-acetamidobutanoate 1-palmitoyl-GPC (16:0) 1- gamma-glutamylalanine1-stearoyl-GPC (18:0) linolenoylglycerophosphocholine (18:3n3)1-eicosatrienoyl-GPC (20:3) gamma-glutamyltryptophan 2-linoleoyl-GPC(18:2) 1-docosapentaenoyl-GPC asparagylleucine 1- (22:5n3)eicosenoylglycerophospho- choline (20:1n9) 1-oleoylplasmenylethanolamineisoleucylalanine 1- eicosapentaenoylglycerophos- phocholine (20:5n3)1-oleoyl-GPE (18:1) leucylaspartate 1- stearoylplasmenylethanolamine2-arachidonoyl-GPE (20:4) methionylalanine 2- stearoylglycerophosphoeth-anolamine 1-oleoyl-GPI (18:1) phenylalanylisoleucine 1-arachidonoyl-GPE(20:4) 1- dimethylglycine 1-stearoyl-GPI (18:0)oleoylglycerophosphoglycerol 1-stearoylglycerol (18:0) N-acetylthreonine1- palmitoylglycerophosphogly- cerol sphingosine N-acetylaspartate (NAA)1-palmitoylglycerol (16:0) pregnenolone sulfate pyroglutaminesphingosine 1-phosphate 5alpha-pregnan-3(alpha or trans-urocanate 7-HOCAbeta),20beta-diol disulfate 16a-hydroxy DHEA 3-sulfateN-6-trimethyllysine 5alpha-pregnan- 3beta,20alpha-diol disulfate4-androsten-3beta,17beta-diol 3-methylglutarylcarnitine-2dehydroisoandrosterone disulfate 2 sulfate (DHEA-S)glycochenodeoxycholate phenyllactate (PLA) 4-androsten-3beta,17beta-diol disulfate 1 taurolithocholate 3-sulfate 4-hydroxyphenylpyruvatetaurocholate glycohyocholate vanillylmandelate (VMA) glycolithocholatesulfate hypoxanthine p-toluic acid hyocholate ADP indoleacetate inosine1-methyladenosine xanthurenate allantoin 1-methylguanosineindole-3-carboxylic acid adenine 5,6-dihydrouracil isovalerylglycine7-methylguanine N4-acetylcytidine isoleucine 5-methyluridine(ribothymidine) trigonelline (N′- 2-hydroxy-3-methylvalerate cytidinemethylnicotinate) pantothenate (Vitamin B5) isobutyrylcarnitine (C4)1-methylnicotinamide gamma-CEHC N-acetylmethionine FAD biliverdin2-hydroxybutyrate (AHB) gamma-tocopherol 4-hydroxyhippurate ureabilirubin (E,E) 3-methyl catechol sulfate 2 dimethylarginine (ADMA +3-hydroxyhippurate SDMA) paraxanthine prolylhydroxyproline 3-methylcatechol sulfate 1 3-methylxanthine N-acetylputrescine caffeine2-isopropylmalate 5-oxoproline 1-methylxanthine homostachydrinegamma-glutamylphenylalanine 1,6-anhydroglucose thymol sulfatealanylleucine erythritol 4-acetylphenyl sulfate glycylleucinestachydrine 2-pyrrolidinone leucylalanine 4-acetaminophen sulfatedimethyl sulfone leucylserine 1,2-propanediol phenylcarnitineiminodiacetate (IDA) 2-hydroxyisobutyrate

TABLE 4 Identified candidate metabolites exhibiting a tail effect1-arachidonoyl-GPC (20:4) 1-arachidonoyl-GPE (20:4)1-docosahexaenoylglycerophosphoethanolamine1-oleoylplasmenylethanolamine 1-palmitoyl-GPC (16:0) 1-palmitoylglycerol(16:0) 1-palmitoylplasmenylethanolamine 1-stearoylglycerol (18:0)1,5-anhydroglucitol (1,5-AG) 17-methylstearate 2-hydroxyisobutyrate2-isopropylmalate 2-pyrrolidinone 3-(3-hydroxyphenyl)propionate3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF) 3-hydroxyhippurate3-indoxyl sulfate 4-ethylphenyl sulfate 4-hydroxyphenylpyruvate5-hydroxyhexanoate 5-hydroxyindoleacetate 8-hydroxyoctanoate caffeinecaprate (10:0) dihydroferulic acid dimethylarginine (ADMA + SDMA)ethanolamine gamma-CEHC gamma-CEHC glucuronide hexadecanedioate (C16)homoarginine hydroxyisovaleroylcarnitine (C5) indoleacetateindolelactate isobutyrylglycine (C4) isovalerylglycine lactatemethionylalanine methylpalmitate (15 or 2) N-acetylaspartate (NAA)N-formylmethionine N1-Methyl-2-pyridone-5-carboxamide p-cresol sulfatepantothenate (Vitamin B5) phenylacetylglutamine phenylalanylargininepipecolate serine serylphenylalanine sorbitol urea3,4-methylene-heptanoylcarnitine sulfated methylparabencyclo(prolylproline) hydroxy-Chlorothalonil phenylacetylcarnitinexanthine

Two panels of metabolites (a 12-metabolite panel composed of themetabolites of FIG. 7 and a 21 metabolite panel composed of themetabolites of Table 1) were tested for ASD risk prediction. The resultsshow that the 12- and 21-metabolite panels contributed strongly toprediction of ASD. An overview of the effects of including and excludingmetabolites of the 12- or 21 panel on ASD prediction is shown in FIGS.14A-D. Whitelists indicate AUC values of classifiers using the data fromthe 12- or 21 metabolite panels only, while blacklists indicate AUCvalues of classifiers excluding the 12- or 21 metabolite panels butusing other metabolites, either from the group of 84 candidatemetabolites or the full group of 600 metabolites (all_candidates=84candidate metabolites; all features=600 metabolites). Mean shift (toppanel) and tail analysis (bottom panel) were performed. These data showthat the predictive information for ASD is attributable to metaboliteswithin the 12- or 21-metabolite panels, whether assessed by mean shiftor tail analysis. Thus, the metabolites observed to exhibit strong taileffects (the metabolites on the 12- and 21-metabolite groups) have muchgreater ASD vs. DD predictive power than the other metabolites from the600 metabolite panel which do not exhibit strong tail effects.

FIGS. 14C-D expands the results from FIGS. 14A-B and include additionalanalyses using Naive Bayes analysis in addition to logistic regression.In addition, FIG. 14B shows results broken up into different cohorts ofsamples (i.e., “Christmas” and “Easter”). The far left panel shows AUCresults in which the classifier was trained on 192 samples and crossvalidated on the Christmas cohort only; the middle left panel shows AUCresults in which the classifier was trained on 299 samples and crossvalidated on Christmas and Easter cohorts; the middle right panel showsAUC results in which the classifier was trained on samples from andcross validated on Easter cohorts only; and the far right panel showsAUC results in which the classifier was trained on samples fromChristmas and Easter cohorts and cross validated on Easter cohorts. Thehighest AUCs were achieved using metabolites within the 12- or21-metabolite panels (e.g., the metabolites exhibiting tail effects).

FIGS. 17A-B, 18A-B and 19A-D further expand the results of FIGS. 14A-Dby showing the AUC predictions by including the 12 or 21 metabolitepanels (whitelists) and by excluding them (blacklists) according to thenumber of features added to the statistical analysis. Panels on top showresults from mean shift analysis while those on the bottom show taileffect analysis. Within each individual panel, the bars representdifferent metabolite panels as indicated by the symbols below and in thelegend.

An exemplary plot describing cumulative AUC for ASD risk prediction whensubsets total of 21 metabolites are assessed is shown in FIG. 15. Inthis figure, the x-axis shows the number of metabolites from subsetsselected from a group of 21 metabolites. The y-axis shows thepredicative power of ASD. For each number on the x-axis, a number ofrandom metabolite combinations was analyzed and their AUC values plotted(dots). The curve shows the increased AUC that results from an increasein the number of metabolites used (selected from the group of 21). Onthe other hand, the figure demonstrates that even subsets having a smallnumber of metabolites (e.g., 3 or 5) exhibit a high AUC. Thus, certainmetabolites appear to have particularly important predictive tails.

An exemplary table describing representative subsets of the 21metabolites from Table 1 containing 3, 4, 5, 6, and 7 metabolites thatyield high AUC values is shown in Table 5. For each subset size (3, 4,5, 6 or 7), 50 random selections of metabolite sets were analyzed. Forexample, for a subset of 3 from a 21-metabolite panel, 50 randomcombinations of a 3-metabolite subset were assessed (out of a total of1330 possible permutations). Combinations from the 50 random sets withthe highest AUC are shown. Thus, certain metabolite combinationscontaining fewer than 21 metabolites yielded high AUC values.Metabolites such as gamma-CEHC, p-cresol sulfate, xanthine,phenylacetylglutamine, isovalerylglycine, octenoylcarnitine, andhydroxy-chlorothalonil, appeared in multiple subsets that yielded highAUC values, indicating that these metabolites may be closely related toASD status of a patient. Thus, these metabolites, alone or incombination with each other or additional metabolites, appear to beparticularly useful for predicting the ASD risk of a patient.

TABLE 5 Exemplary subsets of metabolites and prediction of ASD Number ofmetabolites Representative subset with high AUC AUC 3 gamma-CEHC, 0.675isovalerylglycine p-cresol sulfate 4 Octenoylcarnitine 0.700 gamma-CEHCxanthine phenylacetylglutamine 5 3-indoxyl sulfate 0.6923-(3-hydroxyphenyl)propionate p-cresol sulfate gamma-CEHCHydroxy-Chlorothalonil 6 phenylacetylglutamine 0.731 indoleacetatexanthine Octenoylcarnitine hydroxyisovaleroylcarnitine (C5) pantothenate(Vitamin B5) 7 Octenoylcarnitine 0.720 pantothenate (Vitamin B5)phenylacetylglutamine pipecolate xanthine indoleacetate8-hydroxyoctanoate

Two-metabolite subsets of the 21 metabolites from Table 1 were assessedfor predictability of ASD in paired combinations. Representative pairedcombinations having a robust AUC are shown in Table 6. Similarly,three-metabolite subsets of the 21 metabolites from Table 1 wereassessed for predictability of ASD in triplet combinations.Representative triplet combinations having a robust AUC are shown inTable 7.

TABLE 6 Exemplary metabolite pairs providing robust AUC Metabolites AUCphenylacetylglutamine, xanthine 0.651 phenylacetylglutamine,octenoylcarnitine 0.647 p-cresol sulfate, xanthine 0.646isovalerylglycine, p-cresol sulfate 0.646 octenoylcarnitine, p-cresolsulfate 0.645 phenylacetylglutamine, isovalerylglycine 0.643 gamma-CEHC,p-cresol sulfate 0.641 indoleacetate, p-cresol sulfate 0.635 gamma-CEHC,xanthine 0.633 octenoylcarnitine, xanthine 0.632 isovalerylglycine,pipecolate 0.632 Hydroxyl = chlorothalonil, p-cresol sulfate 0.631phenylacetylglutamine, indoleacetate 0.629 pipecolate, p-cresol sulfate0.629 phenylacetylglutamine, p-cresol sulfate 0.628 1,5-anhydroglucitol(1,5-AG), p-cresol sulfate 0.628 phenylacetylglutamine, lactate 0.627p-cresol sulfate, lactate 0.627 3-(3-hydroxyphenyl)propionate, 3-indoxyl0.625 sulfate pantothenate (Vitamin B5), p-cresol sulfate 0.625

TABLE 7 Exemplary metabolite triplets providing robust AUC MetabolitesAUC phenylacetylglutamine, octenoylcarnitine, 0.685 xanthinephenylacetylglutamine, octenoylcarnitine, 0.681 indoleacetatephenylacetylglutamine, isovalerylglycine, 0.678 octenoylcarnitineisovalerylglycine, octenoylcarnitine, p-cresol 0.678 sulfateisovalerylglycine, octenoylcarnitine, pipecolate 0.677 indoleacetate,isovalerylglycine, p-cresol 0.676 sulfate octenoylcarnitine, p-cresolsulfate, xanthine 0.673 phenylacetylglutamine, isovalerylglycine, 0.671xanthine pantothenate (Vitamin B5), p-cresol sulfate, 0.671 xanthineisovalerylglycine, octenoylcarnitine, lactate 0.670phenylacetylglutamine, isovalerylglycine, 0.670 indoleacetategamma-CEHC, isovalerylglycine, p-cresol 0.670 sulfate indoleacetate,octenoylcarnitine, p-cresol 0.668 sulfate phenylacetylglutamine,pipecolate, xanthine 0.668 pipecolate, p-cresol sulfate, xanthine 0.668octenoylcarnitine, hydroxy-chlorothalonil, 0.667 p-cresol sulfatephenylacetylglutamine, isovalerylglycine, 0.667 gamma-CEHCphenylacetylglutamine, xanthine, gamma- 0.667 CEHCphenylacetylglutamine, p-cresol sulfate, 0.666 xanthine indoleacetate,hydroxy-chlorothalonil, p-cresol 0.666 sulfate

Example 5 Validation of the 12-Metabolite Panel Classifier

Data from 180 samples tested, of which approximately two thirds wereASD, was used to generate a classifier based on the 12 highlyinformative metabolites shown in FIG. 7. The classifier was tested forthe ability to discriminate ASD from non-ASD (here, DD) in a secondcohort of 130 samples. This method provided an unbiased estimate of truepredictive performance, corresponding to an AUC of 0.74. A schematic ofthe process is shown in FIG. 12.

Example 6 Adding Genetic Information to Metabolites May Improve ASD RiskPrediction

Adding genetic information to metabolite information was found toimprove ASD risk prediction for certain groups. For example, combiningcopy number variation (CNVs) data with metabolite informationsignificantly reduces the confidence interval of ASD risk prediction asshown in FIGS. 16A and 16B. As FIG. 16A demonstrates, adding geneticinformation further enhances the separation between ASD and non-ASDgroups. In addition to CNV, other genetic information, including, butnot limited to, Fragile X (FXS) status, may further contribute to adiagnostic test that can predict ASD risk with improved accuracy andreduction type I and/or type II errors. As shown in FIG. 16B, includingsuch additional information (e.g., “PathoCV”), increased the separationbetween ASD and DD groups, and thus helped differentiate between thesetwo conditions.

Example 7 Prominent Biological Pathways Emerging from MetaboliteAnalysis

Further analysis of metabolite information revealed clusters ofmetabolites presented in Table 1 that play a prominent role in distinctbiological pathways. For example, 7 of 21 metabolites are related to gutmicrobial activities (33%) and are shown in Table 8. All 7 are aminoacid metabolites. Six of 7 are metabolites of aromatic amino acids andhave a benzene ring.

TABLE 8 Seven metabolites involved in gut microbial activity Change inASD in the Benzene Bacterially Original Metabolite STORY cohort ringderived precursor 3-indoxyl ASD down Yes yes Tryptophan sulfateindoleacetate ASD down Yes yes Tryptophan p-cresol ASD down Yes yesPhenylalanine sulfate or Tyrosine 4-ethylphenyl ASD down Yes yesPhenylalanine sulfate or Tyrosine phenylacetyl- ASD down Yes yesPhenylalanine glutamine or Tyrosine 3-(3- DD down Yes yes Phenylalaninehydroxyphe- or Tyrosine nyl)propionate pipecolate ASD up No yes Lysine

Analysis of the metabolites that are strongly associated with ASD, asshown in Table 1, reveals connections with certain biological pathways.For example, particular metabolites that provide predictive informationfor ASD suggested impairment of phase II biotransformation, impairedability metabolize benzene rings, dysregulation of reabsorption inkidneys, dysregulation of carnitine metabolism, and imbalance oftransport of large neutral amino acids into brain. Biological pathwayinformation can be further utilized to improve ASD risk assessmentand/or explore etiology and pathophysiology of ASD. Such information canalso be used to develop medicinal therapeutics for treatment ASD.

Example 8 Elucidation of Metabolite Concentrations in Blood

Absolute metabolite concentrations in plasma samples were determined for19 of the 21 metabolites described in Example 2 by mass spectrometry.Absolute metabolite concentrations in plasma (ng/ml) were calculatedusing calibration curves generated from standard samples containingknown amounts of metabolites.

Table 9A shows seventeen metabolites predictive of ASD. The direction ofthe tail effect (left or right), threshold values for determining thepresence of a tail effect (i.e., 15^(th) percentile for left tail effectand 90^(th) percentile for a right tail effect), and odds ratios (log 2)are provided. A positive odds ratio indicates that a tail effect of themetabolite is predictive of ASD.

Table 9B shows seven metabolites predictive of DD. The direction of thetail effect (left or right), threshold values for determining thepresence of a tail effect (i.e., 15^(th) percentile for left tail effectand 90^(th) percentile for a right tail effect), and odds ratios (log 2)are provided. A negative odds ratio indicates that a tail effect of themetabolite is predictive of DD.

TABLE 9A Exemplary metabolites with tail enrichment predictive of ASDDirection Threshold Odds of tail concentration ratio Metabolite effect(ng/ml) (log2) 3-carboxy-4-methyl-5- Left 7.98 0.85 propyl-2-furanpro-panoate (CMPF) 3-indoxyl sulfate Left 256.7 1.7 4-ethylphenyl sulfateLeft 3.0 1.3 5-hydroxyindoleacetate Right 28.5 2.3 gamma-CEHC Left 32.010.8 hydroxyisovaleroylcarnitine (C5) Left 12.9 3.0 indoleacetate Left141.4 1.5 isovalerylglycine Left 1.6 1.0 lactate Right 686600.0 1.61N1-Methyl-2-pyridone-5- Left 124.82 0.94 carboxamide p-cresol sulfateLeft 1220.0 1.6 pantothenate (Vitamin B5) Right 63.3 1.7phenylacetylglutamine Left 166.4 1.3 pipecolate Right 303.6 2.3 xanthineRight 182.7 1.6 hydroxy-chlorothalonil Right 20.3 2.21,5-anhydroglucitol (1,5-AG) Left 11910.3 1.9

TABLE 9B Exemplary metabolites with tail enrichment predictive of DDThreshold Odds Tail concentration Ratio Metabolite Effect (ng/ml) (log2)3-(3-hydroxyphenyl)propionate Left 5.0 −1.5 3-indoxyl sulfate Right926.6 −1.1 isovalerylglycine Right 5.2 −1.3 phenylacetylglutamine Right513.2 −1.5 xanthine Left 88.0 −1.8 3-hydroxyhippurate Left 0.86 −1.251,5-anhydroglucitol (1,5-AG) Right 20600.3 −1.1

What is claimed is:
 1. A method of differentiating between autismspectrum disorder (ASD) and non-ASD developmental delay (DD) in asubject, the method comprising: (i) measuring the levels of a pluralityof metabolites in a sample obtained from the subject, wherein theplurality of metabolites comprises at least two metabolites selectedfrom the group consisting of xanthine, gamma-CEHC,hydroxy-chlorothalonil, 5-hydroxyindoleacetate (5-HIAA), indoleacetate,p-cresol sulfate, 1,5-anhydroglucitol (1,5-AG),3-(3-hydroxyphenyl)propionate,3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, hydroxyisovaleroylcarnitine (C5),isovalerylglycine, lactate, N1-Methyl-2-pyridone-5-carboxamide,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,3-hydroxyhippurate, and combinations thereof; and (ii) calculating thenumber of metabolites in the sample with a level at or below apredetermined threshold concentration: (a) indicative of ASD (ASD lefttail effect) as defined in Table 9A, or (b) indicative of DD (DD lefttail effect) as defined in Table 9B; and/or (iii) calculating the numberof metabolites in the sample with a level at or above a predeterminedthreshold concentration: (a) indicative of ASD (ASD right tail effect)as defined in Table 9A, or (b) indicative of DD (DD right tail effect)as defined in Table 9B; and (iv) determining that the subject has ASD orDD based on the number obtained in steps (ii) and/or (iii).
 2. Themethod of claim 1, wherein the plurality of metabolites comprisesxanthine and gamma-CEHC.
 3. The method of claim 1, wherein the sample isa plasma sample.
 4. The method of claim 1, wherein the metabolite levelsare measured by mass spectrometry.
 5. The method of claim 1, wherein thesubject is no greater than about 54 months of age.
 6. The method ofclaim 1, wherein the subject is no greater than about 36 months of age.7. A method for determining that a subject has or is at risk for ASD,the method comprising: (i) measuring the levels of a plurality ofmetabolites in a sample obtained from the subject, wherein the pluralityof metabolites comprises at least two metabolites selected from thegroup consisting of xanthine, gamma-CEHC, hydroxy-chlorothalonil,5-hydroxyindoleacetate (5-HIAA), indoleacetate, p-cresol sulfate,1,5-anhydroglucitol (1,5-AG),3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), 3-indoxyl sulfate,4-ethylphenyl sulfate, hydroxyisovaleroylcarnitine (C5),isovalerylglycine, lactate, N1-Methyl-2-pyridone-5-carboxamide,pantothenate (Vitamin B5), phenylacetylglutamine, pipecolate,3-hydroxyhippurate, and combinations thereof; and (ii) detecting two ormore of: (a) xanthine at a level of at or above 182.7 ng/ml; (b)hydroxyl-chlorothalonil at a level at or above 20.3 ng/ml; (c)5-hydroxyindoleacetate at a level at or above 28.5 ng/ml; (d) lactate ata level of at or above 686600.0 ng/ml; (e) pantothenate at a level of ator above 63.3 ng/ml; (f) pipecolate at a level at or above 303.6 ng/ml;(g) gamma-CEHC at a level at or below 32.0 ng/ml; (h) indoleacetate at alevel at or below 141.4 ng/ml; (i) p-cresol sulfate at a level at orbelow 182.7 ng/ml; (j) 1,5-anhydroglucitol (1,5-AG) at a level at orbelow 11910.3 ng/ml; (k) 3-carboxy-4-methyl-5-propyl-2-furanpropanoate(CMPF) at a level at or below 7.98 ng/ml; (l) 3-indoxylsulfate at alevel at or below 256.7 ng/ml; (m) 4-ethylphenyl sulfate at a level ator below 3.0 ng/ml; (n) hydroxyisovaleroylcarnitine (C5) at a level ator below 12.9 ng/ml; (o) N1-Methyl-2-pyridone-5-carboxamide at a levelat or below 124.82 ng/ml; and (p) phenacetylglutamine at a level at orbelow 166.4 ng/ml; and (iii) determining that the subject has or is atrisk for ASD based on the metabolite levels detected in step (ii). 8.The method of claim 7, wherein the plurality of metabolites comprisesxanthine and gamma-CEHC.
 9. The method of claim 7, wherein the sample isa plasma sample.
 10. The method of claim 7, wherein the metabolitelevels are measured by mass spectrometry.
 11. The method of claim 7,wherein the subject is no greater than about 54 months of age.
 12. Themethod of claim 7, wherein the subject is no greater than about 36months of age.